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Limits on Support Recovery with Probabilistic 
Models: An Information-Theoretic Framework 

Jonathan Scarlett and Volkan Cevher 


Abstract —The support recovery problem consists of deter¬ 
mining a sparse subset of a set of variables that is relevant in 
generating a set of observations, and arises in a diverse range 
of settings such as compressive sensing, and subset selection 
in regression, and group testing. In this paper, we take a 
unified approach to support recovery problems, considering 
general probabilistic models relating a sparse data vector to an 
observation vector. We study the information-theoretic limits 
of both exact and partial support recovery, taking a novel 
approach motivated by thresholding techniques in channel 
coding. We provide general achievability and converse bounds 
characterizing the trade-off between the error probability and 
number of measurements, and we specialize these to the linear, 
1-bit, and group testing models. In several cases, our bounds 
not only provide matching scaling laws in the necessary and 
sufficient number of measurements, but also sharp thresholds 
with matching constant factors. Our approach has several 
advantages over previous approaches: For the achievability 
part, we obtain sharp thresholds under broader scalings of 
the sparsity level and other parameters (e.g., signal-to-nolse 
ratio) compared to several previous works, and for the converse 
part, we not only provide conditions under which the error 
probability falls to vanish, but also conditions under which it 
tends to one. 

Index Terms —Support recovery, sparsity pattern recovery, 
information-theoretic limits, compressive sensing, non-linear 
models, 1-bit compressive sensing, group testing, phase transi¬ 
tions, strong converse 

D 

I. Introduction 

The support recovery problem consists of determining a 
sparse subset of a set of variables that is relevant in produc¬ 
ing a set of observations, and arises frequently in disciplines 
such as group testing compressive sensing (CS) 

1^, and subset selection in regression Q. The observation 
models can vary significantly among these disciplines, and 
it is of considerable interest to consider these in a unified 
fashion. This can be done via probabilistic models relating 
the sparse vector /3 € to a single observation L € K in 
the following manner; 

{Y\S = s,X = x,l3 = 6) ~ PY\Xs 0 si-\^s,bs), (1) 
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where S C {l,...,p} represents the set of relevant vari¬ 
ables, X S is a measurement vector, Xs (respectively, 
13s) is the subvector of X (respectively, f3s) containing the 
entries indexed by S, and Py\Xs0s i^ ^ given probability 
distribution. Given a collection of measurements Y G K" 
and the corresponding measurement matrix X G (with 

each row containing a single measurement vector), the goal 
is to find the conditions under which the support S can 
be recovered either perfectly or partially. In this paper, 
we study the information-theoretic limits for this problem, 
characterizing the number of measurements n required in 
terms of the sparsity level k and ambient dimension p 
regardless of the computational complexity. Such studies are 
useful for assessing the performance of practical techniques 
and determining to what extent improvements are possible. 

Before proceeding, we state some important examples of 
models that are captured by 0. 

Linear Model: The linear model g, § is ubiquitous 
in signal processing, statistics, and machine learning, and 
in itself covers an extensive range of applications. Each 
observation takes the form 


Y = {X,P) + Z, 


( 2 ) 


where (•,•) denotes the inner product, and Z is additive 
noise. An important quantity in this setting is the signal-to- 
noise ratio (SNR) and in the context of support 

recovery, the smallest non-zero absolute value /3min in P has 
also been shown to play a key role g, g, §. 

Quantized Linear Models: Quantized variants of the 
linear model are of significant interest in applications with 
hardware limitations. An example that we will consider in 
this paper is the 1-bit model 0, given by 

r = sign((X,/3)+Z), (3) 

where the sign function equals 1 if its argument is non¬ 
negative, and —1 if it is negative. 

Group Testing: Studies of group testing problems began 
|TT], and have recently regained 
2], with applications including 
medical testing, database systems, computational biology, 
and fault detection. The goal is to determine a small number 
of “defective” items within a larger subset of items. The 
items involved in a single test are indicated by A G {0,1}^, 
and each observation takes the form 


several decades ago |TO 
significant attention 112 


Y = l\ \J{X, = 1} 


ieS 


z. 


(4) 
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with S representing the defective items, Y indicating 
whether the test contains at least one defective item, and Z 
representing possible noise (here 0 denotes modulo-2 addi¬ 
tion). In this setting, one can think of /3 as deterministically 
having entries equaling one on S, and zero on S^. 

The above examples highlight that Q captures both 
discrete and continuous models. Beyond these examples, 
several other non-linear models are captured by ([T]i, includ¬ 
ing the logistic, Poisson, and gamma models. 


A. Previous Work and Contributions 


Numerous previous works on the information-theoretic 
limits of support recovery have focused on the linear model 
0 , 0, 0 , lT3)-@. The main aim of these works, and of 
that the present paper, is to develop necessary and sufficient 
conditions for which an “error probability” vanishes as 
p ^ oo. However, there are several distinctions that can 
be made, including; 


• Random measurement matrices 0,0,0,111 VS. ar¬ 
bitrary measurement matrices m, n), n); 

• Exact support recovery Jl, J7], 0, d vs. partial 
support recovery oft d; 

• Minimax characterizations for /3 in a given class 0, 
0. 0. PI vs. average performance bounds for ran¬ 
dom /3 01’ ED- 


Perhaps the most widely-studied combination of these is 
that of minimax characterizations for exact support recovery 
with random measurement matrices. In this setting, within 
the class of vectors /3 whose non-zero entries have an 
absolute value exceeding some threshold ^min, necessary 
and sufficient conditions on n are available with matching 
scaling laws 0, 0. See also pl , p4| for information- 
theoretic studies of the linear model with a mean square 
error criterion. 

Compared to the linear model, research on the 
information-theoretic limits of support recovery for non¬ 
linear models is relatively scarce. The system model that we 
have adopted follows those of a line of works seeking mutual 
information characterizations of sparsity problems 0’d’ 
d’ El’ though we make use of signihcantly different 
analysis techniques. Similarly to these works, we focus on 
random measurement matrices and random non-zero entries 
of /?. Other works considering non-linear models have used 
vastly different approaches such as regularized M-estimators 
El, d and approximate message passing | [28) . 

High-level Contributions: We consider an approach 
using thresholding techniques akin to those used in 
information-spectrum methods | |29t , thus providing a new 
alternative to previous approaches based on maximum- 
likelihood decoding and Fano’s inequality. Our key contri¬ 
butions and the advantages of our framework are as follows: 


1. Considering both exact and partial support recovery, we 
provide non-asymptotic performance bounds applying 
to general probabilistic models, along with a procedure 
for applying them to specihc models (cf. Section[riI-B|i. 


2. We explicitly provide the constant factors in our 
bounds, allowing for more precise characterizations 
of the performance compared to works focusing on 
scaling laws (e.g., see 0, 0, |[20)). In several cases, 
the resulting necessary and sufficient conditions on the 
number of measurements coincide up to a multiplica¬ 
tive 1 0 o(l) term, thus providing exact asymptotic 
thresholds (sometimes referred to as phase transitions 
ED’ l |30| ) on the number of measurements. 

3. As evidenced in our examples outlined below, our 
framework often leads to such exact or near-exact 
thresholds for signihcantly more general scalings of k, 
SNR, etc. compared to previous works. 

4. The majority of previous works have developed con¬ 

verse results using Fano’s inequality, leading to nec¬ 
essary conditions for P[error] —>■ 0. In contrast, 

our converse results provide necessary conditions for 
P[error] 1. The distinction between these two con¬ 
ditions is important from a practical perspective: One 
may not expect a condition such as P[error] > 10“^° 
to be signihcant, whereas the condition P [error] —>■ 1 
is inarguably so. 


Contributions for Specific Models: An overview of our 
bounds for specific models is given in Table [I] where we 
state the derived bounds with the asymptotically negligible 
terms omitted. All of the models and their parameters are 
defined precisely in Section |IV[ in particular, the functions 
/i,..., /g and the remainder terms (Ai, Ag) are given ex¬ 
plicitly, and are easy to evaluate. We proceed by discussing 
these contributions in more detail, and comparing them to 
various existing results in the literature; 


1. (Linear model) In the case of exact recovery, we 
recover the exact thresholds on the required number 
of measurements given by Jin et al. p7] , as well 
as handling a broader range of scalings of /3min ■= 
min{|/3i| : /3i ^ 0} (see Section IV-A| for details) 
and strengthening the converse by considering the more 
stringent condition P [error] —1. Our results for 
partial recovery provide near-matching necessary and 
sufficient conditions under scalings with k = o{p), 
thus complementing the extensive study of the scaling 
k = Q{p) by Reeves and Gastpar flsj , fl^ . 

2. (1-bit model) We provide two surprising observations 

regarding the 1-bit model: Corollary [^provides a low- 
SNR setting where the quantization only increases the 
asymptotic number of measurements by a factor of 
whereas Corollary [^provides a high-SNR setting where 
the scaling law is strictly worse than the linear model. 
Similar behavior will be observed for partial recovery 
(Corollaries E] by numerically comparing the 

bounds for various SNR values. 

3. (Group testing) Asymptotic thresholds for group testing 
with k = 0(1) were given previously by Malyu- 
tov ED and Atia and Saligrama 0. However, for 
the case that k oo, the sufficient conditions of 
0 that introduced additional logarithmic factors. In 
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Model 

Result 

Parameters 

Distributions 

Sufficient n for P[errorJ —>■ 0 

Necessary n for P[errorJ 1 

Linear 

Cor.[^ 

k = o(p) 

Discrete /3s 
Gaussian X 

(l + Ai)log 

max - 

/i(£) 

(Ai — >• 0 for various scalings) 

log (2-,"+0 

max - - - 

1=1, ...k /i(/) 

Cor.|^ 

k ^ oo, k = o{p} 
Partial recovery of 
proportion 1 — a* 

Gaussian /3s 
Gaussian X 

ak log 2 

max - -— ^ 

a^[a*,l] f2{0') 

(a - a*)A:log 2 

max 

oc€[oc*,l] f 2 {oi) 

1-bit 

Cor. 1^ 

k = 0(1) 

Low SNR 

Discrete /3s 
Gaussian X 

ilogp 

max - 

e^i,...k / 3 (£) 

(within a factor ^ of linear model) 

£logp 

max - 

t=l,...k /3(/) 

(within a factor ^ of linear model) 

Cor.|^ 

k = 0(p) 

High SNR 

Fixed /3s 
Gaussian X 

- 

n(pviogp) 

(compared to 0(p) for linear model) 

Cor.|^ 

k ^ oo, k = o{p} 
Partial recovery of 
proportion 1 — a* 

Gaussian /3s 
Gaussian X 

ak log 2 

max - — 

a6[a*,ll / 5 (a) 

(a - a*)A:log 2 
max -^ ^ 

/ 5 (a) 

Group testing 

Cor.|^ 

k = 0(/) 

Fixed /3s 
Bernoulli X 

fclog 1 

(/6(e) = e < i) 

fclog 1 

log 2 

Cor.|^ 

k = 0(p‘>) 

Noisy (crossover 
probability p) 

Fixed / 3 s 
Bernoulli X 

fclog 2 

if7(0) — log 2 “^^2 (p) for small 6) 

fclog 2 

log 2 - H2{p) 

Cor.[| 

k oo, k = o(p) 

Partial recovery of 
proportion 1 — a* 

Fixed / 3 s 
Bernoulli X 

fclog 1 

log2 - H2{p) 

(1 - a*)(fclog 2 ) 
log2 - H2{p) 

General 

discrete 

observations 

Cor.|^ 

Arbitrary 

Arbitrary 

- 

log (2-J+O 

max ——- 

i=l,...k /g(£) + Ag 


Table I: Overview of main results for exact or partial support recovery under various observation models. In the necessary 
and sufficient number of measurements, asymptotically negligible terms have been omitted. All quantities are defined 
precisely in Section 


IV 


contrast, we obtain matching 0(A:log|) scaling laws 
for any sublinear scaling of the form k = 0{p^) 
{9 G (0,1)). Moreover, for sufficiently small 9 we 
obtain exact thresholds. In particular, for the noiseless 
setting we show that n ~ k log 2 | measurements are 
both necessary and sufficient for 9 < I- This is in 
fact the same threshold as that for adaptive group 
testing ED, thus proving that non-adaptive Bernoulli 
measurement matrices are asymptotically optimal even 
when adaptivity is allowed; this was previously known 
only in the limit as 0 —0 p2) . For the noisy case, we 
prove an analogous claim for sufficiently small 9. A 
shortened and simplihed version of this paper focusing 
exclusively on group testing can be found in p^ . 

4. (General discrete observations) Our converse for the 
case of general discrete observations (Corollary 
recovers that of Tan and Atia | [25l for the case that 
Ps is fixed, strengthens it due to a smaller remainder 
term Ag, and provides a generalization to the case that 
Ps is random. 

B. Structure of the Paper 

In Section we introduce our system model. In Section 
we present our main non-asymptotic achievability and 
converse results for general observation models, and the 
procedure for applying them to specific problems. Several 
applications of our results to specihc models are presented 


in Section 1^ The proofs of the general bounds are given 
in Section |V| and conclusions are drawn in Section |VI| 

C. Notation 

We use upper-case letters for random variables, and lower¬ 
case variables for their realizations. A non-bold character 
may be a scalar or a vector, whereas a bold character refers 
to a collection of n scalars (e.g., Y G K”) or vectors (e.g., 
X G We write Ps to denote the subvector of /3 at the 

columns indexed by S, and Xs to denote the submatrix of 
X containing the columns indexed by S. The complement 
with respect to {1,... ,p} is denoted by (•)‘^. 

The symbol ^ means “distributed as”. For a given joint 
distribution PxY, the corresponding marginal distributions 
are denoted by Px and Py, and similarly for conditional 
marginals (e.g., Py\x)- We write P[-] for probabilities, E[-] 
for expectations, and Var[-] for variances. We use usual 
notations for the entropy (e.g., H{X)) and mutual infor¬ 
mation (e.g., I{X;Y)), and their conditional counterparts 
(e.g., H{X\Z), I{X-,Y\Z)). Note that H may also denote 
the differential entropy for continuous random variables; the 
distinction will be clear from the context. We define the bi¬ 
nary entropy function Pd^ip) '■= —p^ogp—{l—p) log(l—p), 
and the Q-function Qiyi:) := F[W > x] (W ^ N(0, 1)). 

We make use of the standard asymptotic notations O(-), 
o(-), 0(-), ^^(■) and w(-). We dehne the function [•]■*■ = 
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max{0, •}, and write the floor function as [-J. The function 
log has base e. 

II. Problem Setup 
A. Model and Assumptions 

Recall that p denotes the ambient dimension, k denotes 
the sparsity level, and n denotes the number of measure¬ 
ments. We let S be the set of subsets of {1,... ,p} having 
cardinality k. The key random variables in our setup are the 
support set S G S, the data vector (3 G the measurement 
matrix X G and the observation vector Y G 

The support set S' is assumed to be equiprobable on 
the subsets within S. Given S, the entries of /Sgc 
are deterministically set to zero, and the remaining entries 
are generated according to some distribution /3s ~ Pp^. 
We assume that these non-zero entries follow the same 
distribution for all of the possible realizations of S, and 
that this distribution is permutation-invariant. 

The measurement matrix X is assumed to have i.i.d. val¬ 
ues on some distribution Px- We write to denote 

the corresponding i.i.d. distributions for matrices, and we 
write P^ as a shorthand for P’^^ . Given S, X, and /3, 
each entry of the observation vector Y is generated in a 
conditionally independent manner, with the i-th entry 
distributed according to 

(Y«|5 = = x«,/3 = b) ^ 

(5) 

for some conditional distribution Py\XsPs- again assume 
symmetry with respect to S, namely, that Py\XsPs 
depend on the specific realization, and that the distribution 
is invariant when the columns of Xs and the entries of /3s 
undergo a common permutation. 

Given X and Y, a decoder forms an estimate S of S. 
Similarly to previous works studying information-theoretic 
limits on support recovery, we assume that the decoder 
knows the system model. We consider two related perfor¬ 
mance measures. In the case of exact support recovery, the 
error probability is given by 

P, ■.= ¥[§ y^S], ( 6 ) 

and is taken with respect to the realizations of S, /3, X, 
and Y; the decoder is assumed to be deterministic. We also 
consider a less stringent performance criterion requiring that 
only k — dmax entries of S are successfully recovered, for 
some dmax G {1,..., fc — 1}. Following p^ , p^ , the error 
probability is given by 

Pe(fimax) := IP[|5'\5| > dmax U \S\S\ > dmax] ■ (7) 

Note that if both S and S have cardinality k with probability 
one, then the two events in the union are identical, and hence 
either of the two can be removed. 

For clarity, we formally state our main assumptions as 
follows: 

'Extensions to more general alphabets beyond R are straightforward. 


[Al] The support set S is uniform on the subsets of 
{1,..., p} of size k, and the measurement matrix X is 
i.i.d. on some distribution Px- 
[A2] The non-zero entries (3s distributed according to 
P/ 3 s, and this distribution is permutation-invariant and 
the same for all realizations of S. 

[A3] The observation vector Y is conditionally i.i.d. ac¬ 
cording to Py\Xs/ 3 s’ distribution is the same 

for all realizations of S, and invariant to common 
permutations of the columns of Xs and entries of (3s- 
[A4] The decoder is given (X, Y), and also knows the 
system model including k, Py\XsPs’ ^ 3 s- 

Our main goal is to derive necessary and sufficient 
conditions on n and k (as functions of p) such that Pg or 
Pe(c?inax) Vanishes as p —> oo- Moreover, when considering 
converse results, we will not only be interested in conditions 
under which Pg 0, but also conditions under which the 
stronger statement Pg —> 1 holds. 

In particular, we introduce the terminology that the strong 
converse holds if there exists a sequence of values n*, 
indexed by p, such that for all p > 0, we have Pg — 0 when 
n > n*(l + p), and Pg —>■ 1 when n < n*(l — p). This is 
related to the notion of a phase transition p4) , p0| . More 
generally, we will refer to conditions under which Pg —> 1 
as strong impossibility results, not necessarily requiring 
matching achievability bounds. That is, the strong converse 
conclusively gives a sharp threshold between failure and 
success, whereas a strong impossibility result may not. 

It will prove convenient to work with random variables 
that are implicitly conditioned on a fixed value of S, say 
s = {1,..., k}- We write Pp^ and Py\XsI 3 s place of Pp^ 
and Py\Xs/3s emphasize that S = S- Moreover, we define 
the corresponding joint distribution 

PjSsXsY y) 

( 8 ) 

and its multiple-observation counterpart 

P/3.X,Y(6., y) := Pp^ {bs)PT " {^s)P^\x,p^ (y |x«, &«). 

(9) 

where Py\x 0 ('K^s) is the n-fold product of 
7V|X,/3,(’h ^s)- 

Except where stated otherwise, the random variables 
{(3s, Xs,Y) and (/3s,Xs,Y) appearing throughout this pa¬ 
per are distributed as 

{(3s,Xs,Y)r^ Pp^x^Y (10) 

(^„X„Y)^P^^x,y, (11) 

with the remaining entries of the measurement matrix being 
distributed as X^c ~ p^ip-P^ _ q (Jejej-. 

ministically. That is, we condition on a fixed S = s except 
where stated otherwise. 

For notational convenience, the main parts of our analysis 
are presented with Pp^, Px and Py\x, 0 , representing prob¬ 
ability mass functions (PMFs), and with the corresponding 
averages written using summations. However, except where 
stated otherwise, our analysis is directly applicable to case 
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that these distributions instead represent probability density 
functions (PDFs), with the summations replaced by inte¬ 
grals where necessary. The same applies to mixed discrete- 
continuous distributions. 


Cliaiiiicl State 



B. Information-Theoretic Definitions 

Before introducing the required definitions for support 
recovery, it is instructive to discuss thresholding techniques 
in channel coding studies. These commenced in early works 
such as p4) , p5| , and have recently been used extensively 
in information-spectrum methods | [29) , p^ . 

1) Channel Coding: We first recall the mutual informa¬ 
tion, which is ubiquitous in information theory; 

I{X- Y) -.= ^ Pxy{x, y) log . (12) 


In deriving asymptotic and non-asymptotic performance 
bounds, it is common to work directly with the logarithm. 


i{x] y) := log 


PY\x{y\x) 

Priy) 


(13) 


Figure 1: Connection between support recovery and coding 
over a mixed channel. 


For fixed s G S and a corresponding pair (sdu, Seq), we 
introduce the notation 

:= ^Y|x,(y|xs) (15) 

(.y\xsi,t,Xs,^,bs) := Pyix^p, {y\xs, h), 

(16) 

where Fy|x, is the marginal distribution of Q. While the 
left-hand sides of ([T5ll-([T6ll represent the same quantities for 
any such (sdif,Seq), it will still prove convenient to work 
with these in place of the right-hand sides. In particular, this 
allows us to introduce the marginal distributions 


which is commonly known as the information density. 
The thresholding techniques work by manipulating prob¬ 
abilities of events of the form *(^* 51 ^) — 7 ™tl 

Er=i 7 7 . For the former, one can perform 

a change of measure from the conditional distribution Y 
given X to the unconditional distribution of Y, with a 
multiplicative constant e~'^. For the latter, one can similarly 
perform a change of measure from Y to (Y|X). Hence, in 
both cases, there is a simple relation between the conditional 
and unconditional probabilities of the output sequences. 

Using these methods, one can get upper and lower bounds 
on the error probability such that the dominant term is 


P 


1 

n 


2 = 1 


(14) 


for some C,n = o(l). Assuming that {{Xi,Yi)}2^^ has some 
form of i.i.d. structure, one can analyze this expression using 
tools from probability theory. The law of large numbers 
yields the channel capacity C = I{X\Y), and 

refined characterizations can be obtained using variations 
of the central limit theorem iz)- 

Among the channel coding literature, our analysis is most 
similar to that of mixed channels | [29l Sec. 3.3], where 
the relation between the input and output sequences is not 
i.i.d., but instead conditionally i.i.d. given another random 
variable. In our setting, jds will play the role of this random 
variable. See Figure for a depiction of this connection. 

2) Support Recovery: As in Q, p4] |, we will consider 
partitions of the support set s S 5 into two sets Sdif f 0 
and Seq- As will be seen in the proofs, Seq will typically 
correspond to an overlap between s and some other set s 
(i.e., s n s), whereas Sdit will correspond to the indices in 
one set but not the other (e.g., s\s). There are 2^ — 1 ways 
of performing such a partition with Sdif 7 ^ 0 . 


-Py|X,,, (y|Xseq) 


x£ 

X V^Sd 


Xs.if)Py|x,_j.jX,^Jy|x^dif,Xs,J (17) 


Py\x,^^I 3, {y\xs,^,bs) 

'y ' PxiXsdif)PY\X,j,cX,^„/3siy\^sdiC’XsgcT^s)j ( 18 ) 

“=dif 

where i := |sdif|. Using the preceding definitions, we 
introduce two information densities. The first contains prob¬ 
abilities averaged over fig, 

nxsdif; y|xs,q) := log-^-> (19) 




Y|Xa 


,(y|Xs,q) 


whereas the second conditions on jSg = bg’. 

n 

*”(Xddit;y|Xdeq>^s) := (20) 


2=1 


where the single-letter information density is 

Py IX,X,,, /3. {y I a^ddif :Xs^^,bs) 


t-ixs^,C,y\Xg^„,bs) := log- 




( 21 ) 


As mentioned above, we will generally work with discrete 
random variables for clarity of exposition, in which case 
the ratio is between two PMFs. In the case of continuous 
observations the ratio is instead between two PDFs, and 
more generally this can be replaced by the Radon-Nikodym 
derivative as in the channel coding setting ID- 

Averaging ( |2T] i with respect to the random variables in 
( [T0| conditioned on /3s = bg yields a conditional mutual 
information, which we denote by 


■■= IiXg,,;Y\Xg,^,Ps = bg). (22) 
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This quantity will play a key role in our bounds, which will 
typically have the form 




n < max 

(SdifjSeq) 


(23) 


as will be made more precise in the subsequent sections. 


Ill. General Achievability and Converse Bounds 

In this section, we provide general results holding for 
arbitrary models satisfying the assumptions given in Section 
[n] Each of the results for exact recovery has a direct 
counterpart for partial recovery. For clarity, we focus on 
the former throughout Sections |III-A and |III-B| and then 
proceed with the latter in Section III-C| 


A. Initial Non-Asymptotic Bounds 

Here we provide our main non-asymptotic upper and 
lower bounds on the error probability. These bounds bear 
a strong resemblance to analogous bounds from the channel 
coding literature | | 2 ^ ; in each case, the dominant term 
involves tail probabilities of the information density given 
in The mean of the information density is the mutual 
information in ( | 22 | ), which thus arises naturally in the 
subsequent necessary and sufficient conditions on n upon 
showing that the deviation from the mean is small with high 
probability. The procedure for doing this given a specific 
model will be given in Section |III-B| 

We start with our achievability result. Here and through¬ 
out this section, we make use of the random variables 
defined in O- 

Theorem 1. For any constants i5i > 0 and 7 , there exists a 
decoder such that 


Moreover, this can be strengthened by noting from the 
proof of Theorem that 7 may depend on /3s, and 
choosing 7 ( 63 ) = log p- 757 ) accordingly. 

2. Defining 


/o :=/(/3s;Y|Xs) 

^V|x,a(Y|Xs,/3s) 


Vn := Var 


log 


P. 


Y|X, 


,(Y|Xs) 


(27) 

(28) 


we have for any (5o > 0 that 

1% 


'7 = ^0 + \ -Po(7) < <^o- (29) 

V <J 0 

This follows directly from Chebyshev’s inequality. 

3. Defining 

Py|x,/3,(Y|Xs,/3s)^ + 


7o,+ := E 


log- 


(30) 


Py|x,(Y|Xs) 
we have for any <3o > 0 that 

7=-^ ^ (31) 

oq 

This follows directly from Markov’s inequality. 

The proof of Theorem [T] is based on a decoder the 
searches for a unique support set s such that 


*(Xsd.t;y|Xseq) > 7|Sd.f| (32) 

for some and all 2 ^ — 1 partitions (sdifj Seq) of s 

with Sdif ^ 0- Since the numerator in ( [T9] l is the likelihood 
of y given (xsj;f,Xs^ ), this decoder can be thought of a 
weakened version of the maximum-likelihood (ML) decoder. 
Like the ML decoder, computational considerations make its 
implementation intractable. 

The following theorem provides a general non-asymptotic 
converse bound. 


P. < 


< log 


where 


u 

(Sdif:Seq) '• Sdif^0 

p — k 


*"(Xs,,;Y|Xs,,,/3s) 


Sdif 




+Po(7)+25i, 

(24) 


Foil) ■■= 


Py|x,a(Y|X«,/3,) 
log-A-TTTTTTT- > T 


Py|x,(Y|X,) 
Proof: See Section |V-A| 


(25) 


Theorem 2. Fix > 0, and let (sdif (&s)) Seq(l's)) be 
an arbitrary partition of s = fc} (with Sdif 0) 

depending on bg S For any decoder, we have 


P. > 


*-(X,,,(^,);Y|X,^^(^^),/3,) 


< log 


fp-kF |sdif(/3s)| 
V |sdif(| 0 s)| 
Proof: See Section |WB 


log i5i 


-5i. (33) 


The proof of Theorem [2 is based on Verdii-Han type 
bounding techniques 


Remark 1. The probability in the definition of Po( 7 ) is not 
an i.i.d. sum, and the techniques for ensuring that Po( 7 ) —0 
vary between different settings. The following approaches 
will suffice for all of the applications in this paper: 

1. In the case that is discrete, Py|x,(yI^s) = 

Eb, -P/ 3 ,(^s)-pY|x,,/ 3 .(y|xs,&s), and it follows that 


7 = log 


minb^ PpXbs) 


Po(7) = 0. (26) 


B. Techniques for Applying Theorems and 

The bounds presented in the preceding theorems do not 
directly reveal the number of measurements required to 
achieving a vanishing error probability. In this subsection, 
we present the steps that can be used to obtain such 
conditions. We provide examples in Section |IV] 

The idea is to use a concentration inequality to bound the 
first term in (|24li (or (|3^), which is possible due to the fact 
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that each summation i" is conditionally i.i.d. given j3s- We 
proceed by providing the details of these steps separately 
for the achievability and converse. We start with the former. 

1. Observe that, conditioned on /3s = bg, the 

mean of Ps) is n/s_j.j,se,( 6 s), where 

is defined in ( | 22 ] l. 

2. Fix 62 € (0,1), and suppose that for a fixed value bg 
of /3s, we have for all (sdif,Seq) that 


log 


p — k 
|Sdif| 


log ( 72 


+ 7 


k 

SdifI 

— ^(1 ~ '^2)3^Sdif,Seq(^s); 


(34) 


and 


”(Xsd,;Y|Xs,„6s) < n(l-(52)/sd....q(&.) |/3. = b 

< V'|sdif|(^>'^ 2 ) (35) 
for some functions {'tp£}e^i (e.g., these may arise from 


Chebyshev’s inequality or Bernstein’s inequality |38 


Ch. 2]). Combining these conditions with the union 
bound, we obtain 


u 

(Sdif:^eq) • 

p — k 

Sdif I 


7"(Xs,,;Y|Xs,„/3s) 


< log 


/3s = bg 


-“< 1 ( 4 ,))-} 

- X! (36) 


3. Observe that the condition in ( |34l i can be written 


n > 


(37) 


'^Sdif.Seq(l's)(l ^ 2 ) 

We summarize the preceding findings in the following. 

Theorem 3. For any constants (5i > 0, 52 S (0,1) and 7 , 
and functions (ipi : Z x M. ^ K), define the set 


^(5i,52,7) := {bg : ( [35] l and ( |J7] ) hold for all 

(^difi-Seq) wlth Sdif 7 ^ 0} • (38) 

Then we have 

Pe <P[/3d ^e(5i,52,7)]+E Q^7(n,52)+Po(7)+25i. 

(39) 


Remark 2. The preceding arguments remain unchanged 
when 82 also depends on f = |sdif|. We leave this possible 
dependence implicit throughout this section, since a fixed 
value will suffice for all but one of the models considered 
in Section HVl 

In the case that ( [35| l holds for all bg (or more generally, 
within a set whose probability under tends to one) and 
the final three terms in ([39]l vanish, the overall upper bound 


approaches the probability, with respect to that 0 
fails to hold. In many cases, the second logarithm in the 
numerator therein is dominated by the first. It should be 
noted that the condition that the second term in ( |39| ) vanishes 
can also impose conditions on n. For most of the examples 
presented in Section the condition in ( [37] i will be the 
dominant one; however, this need not always be the case, 
and it depends on the concentration inequality used in ( [35] l. 

The application of Theorem]^ is done using similar steps, 
so we provide less detail. Fix 62 > 0 , and suppose that, for a 
fixed value bg of fig, the pair (sdit, Seq) = (sdif (5s), Seq{bg)) 
is such that 

log > n(l+52)/sdif.Seq(5s), (40) 

V kdifl J 

and 


Y|X,^^, 6 ,) < n(l + S 2 )Ig,„gJbg) \ fig = b 

^ l“'^|sd.f|(^’'^2) (41) 


for some function Combining these conditions, we 

see that the first probability in ( [3^ , with an added condition¬ 
ing on fig = bg, is lower bounded by l — ^ 2 ). In the 

case that is defined for multiple I values corresponding 
to different values of bg, we can further lower bound this 
by 1 - maxi ^ 2 ). 

Next, we observe that (|40li holds if and only if 


n < 


^a..Sqq( 5 s)(l + 52 ) 


(42) 


Recalling that the partition (sdif, Seq) is an arbitrary function 
of /3s, we can ensure that this coincides with 


n < 


max 


-^Sdit,Seq( 5 s)(l + ^ 2 ) 


(43) 


by choosing each pair (sdif,Seq) as a function of bg to 
achieve this maximum. 

Finally, we note that the maximum over i in the above- 
derived term 1 — max^ 82 ) may be restricted to any set 
C C {1,..., fc} provided that |sdif | is constrained similarly 
in (ED; one simply chooses the partition (sdif (6s), Seq(6s)) 
so that £ = I Sdif I always lies in this set. Putting everything 
together, we have the following. 


Theorem 4. For any set £ C {1,..., fc}, constants 5i > 0 
and 82 > 0 , and functions ('fit : Z x M —> K), 

define the set 


{ 81 , 82 ) ■= {6s : (ED tind ( |42| l hold for all 

(sdif,Seq) with |sdif| G C}. (44) 

Then we have 

Pe > P[/3s G ,B'(5 i,52)] - maxtp{{n, 82 )'j - 5i. (45) 

If the pair (sdif, Seq) had been fixed in Theorem]^ as op¬ 
posed to being a function of fig, then we would have only ob¬ 
tained a weaker result with the statement “for all (sdif, Seq)” 













Procedure 1: Steps for Obtaining Necessary and Sufficient 
Conditions on n from Theorems [3 and _ 

1. (Identify a Typical Set) Construct a sequence of “typ¬ 

ical” sets 7^ C of non-zero entries, indexed by 
p, such that P[/?s € Tp] —> 1 , thus restricting the 
vectors bg for which ,bs) needs to be 

characterized. 

2. (Bound the Information Density Tail Probabilities) Us¬ 
ing a concentration inequality for i.i.d. summations 
(e.g., Chebyshev, Bernstein), bound the tail probabil¬ 
ities in ( [^ and ( |4T] l for each (sdif, Seq) and bg € Tp, 
with a fixed constant 52 - Upon making these dependent 
on (sdif, Seq, 6 s) Only through ^ := |sdif|, the bounds 
are denoted by fn{n, 52 ) and 'ff^{n, 52 )- 

3. (Control the Remainder Terms) By suitable rearrange¬ 
ments, find conditions on n under which the terms 

and maxfg£ 62 ) in @ and 
•ED vanish, thus ensuring that their contribution is 
negligible. Similarly, choose 61 to vanish with p so that 
its contribution is negligible, and for the achievability 
part, choose 7 such that the remainder term Poiy) 
vanishes (cf Remark [T]). 

4. (Combine and Simplify) Combine the previous steps as 
follows: 

a) Construct the set of non-zero entries ;B( 6 i,( 52 , 7 ) C 

(respectively, B'{ 61 , 52 )) in ( [38| ) (respectively, 

®); 

b) Deduce from ([3^ (respectively, ( |45l l) and Step 3 that 
Pe < P[/3s i B{ 5 i, 52 ,l)\+o{l) (respectively, > 
mSg &B'{ 61 , 62 )]+o{l))- 

c) From the properties of the typical set Tp in Steps 1- 
2 , deduce that Pg —r 0 (respectively, Pg —r 1 ) when 
n satisfies 0 (respectively, @) for all bgGTp; 

d) Augment this condition on n with Step 3. 


in ( |44l i replaced by a fixed pair. Assuming that the remainder 
terms in ( |45l l are insignificant, this weaker result is of the 
form Pg > max(s_j.j^seq) IP[« < /(sdif, Seq,/?s)] rather than 
Pe > P[n < /(sdu, Seq, I3s)] ■ This can lead to 

significantly different bounds on the sample complexity, and 
the distinction is crucial in our applications in Section|IV] As 
described in the proof in Section [V] the key to obtaining this 
difference is in applying a refined version of an argument 
based on a genie. 

The general steps in applying Theorems and to 
specific problems are outlined in Procedure 

In our experience, the choice of Tp in the first step 
of Procedure [T] usually comes naturally given the specific 
model. On the other hand, it is often less straightforward 
to find a sufficiently powerful concentration inequality in 
Step 2. A simple choice is Chebyshev’s inequality, which 
expresses t/je and in terms of Isai{,s^^{bs) (see ( | 22 l l) and 
the corresponding variances of the information densities. 
This choice is usually effective for the converse, wheres 
the achievability part typically requires sharper concentra¬ 


tion inequalities such as Bernstein’s inequality, due to the 
combinatorial terms in ([39ll. 


C. Extensions to Partial Recovery 

We now turn to the partial support recovery criterion 
in 0 - The changes in the analysis required to generalize 
Theorems and to this setting are given in Section |V-C[ 
rather than repeating each of these, we focus our attention 
on the resulting analogues of Theorems and 

Theorem 5. For any constants 61 > 0, 62 € (0,1) and 
7 > 0, and functions (■0^ : Z x R —> R), 

define the set 

B{5i,52,-i) := { 6 s : 0 and 0 hold for all 

( Sdif, Seq) with I Sdif I & {d max + l,...,k}}. (46) 

Then we have 


T’e(rfmax) < P [/3s ^ B{5i,52,y)] 


+ E 


V'Kn,52)+Po(7) + 26i, (47) 


where Pq is defined in 

For the converse part, ( |42l i is replaced by 

>»s r-fiT"') - log E/:;- ( 7 ‘) ('"/') - log 


n > 




{bs) 


(48) 


and we have the following analog of Theorem 

Theorem 6 . For any set C C {dmax + 1, ■ ■ ■, k}, constants 
(5i > 0 and 62 € (0,1), and functions {i^'t ■ ^ x 

R —> R), define the set 


^^(61,62) 1= {6s : ED ED 

(sdif,Seq) with |sdif| G P}■ (49) 

Then we have 

T’e(rfmax) > P[/3s G B'{61,62)] (^1 “ max' 4 )g{n, 62)^ - 6i. 

(50) 

The applications of Theorems and follow identical 
steps to Procedure However, it will be seen that the 

restriction |sdif| > t^max can in fact considerably simplify 
these steps, since it removes the need to obtain concentration 
inequalities for smaller values of |sdif|. 


D. Comparison to Fano’s Inequality 

Most previous works on the information-theoretic limits 


of sparsity recovery have made use of Fano’s inequality 139 


Sec. 2.11]. For this reason, we provide here a discussion on 
the relative merits of this approach and our approach. To 
this end, we consider the following bound, which can be 
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obtained by combining the analysis of 0 , 0 with our 
refined genie argument; 


Pe > X! i^s) max i 0,1 - 


Ino- (P->^+\^dii(bs)\'\ 

( 51 ) 

in the notation of Theorem |2| By analyzing this bound 
similarly to Section I1I-B[ we obtain for any ^2 > 0 that 

where 


^Fano('^2) := \ bs : U < 


log 


/p-fe+|sdit| 
V kditi 


(1 - ^ 2 ) 


-'Sdif,Seq(^s) 

for all (sdif, Seq) with Sdif ^ 0|. (53) 


A similar result for partial recovery can also be derived by 
incorporating the arguments from and the present paper. 

As discussed in the introduction, the key advantage of 
Theorem is that it provides a more precise characteri¬ 
zation of how far the error probability is from zero, and 
in particular, the conditions under which Pe —> 1 (strong 
impossibility results). On the other hand, the bound on Pe 
in ( |5^ is always bounded away from one for fixed ^ 2 ^ and 
becomes increasingly weak for small 

The advantage of Fano’s inequality is that it only re¬ 
quires the mutual information to be computed, whereas our 
approach also requires the application of a concentration 
inequality. This, in turn, typically requires the variance of the 
information density to be characterized, which is not always 
straightforward. However, as discussed following Procedure 
[T] the main difficulty associated with these concentration 
inequalities is typically in finding one which is sufficiently 
powerful for the achievability part. Thus, the added difficulty 
in the converse may not add to the overall difficulty in 
deriving matching achievability and converse bounds. 


IV. Applications to Specific Models 
In this section, we present applications of Theorems [3j|^ 
to the linear g, i-bit g, and group testing 0 models, 
and to more general models with discrete observations p5) . 
Throughout the section, we make use of general concentra¬ 
tion inequalities given in Appendix We also make use of 
the following variance quantity; 

| /3« = b^]. (54) 


A. Linear Model with Discrete /3s 

Here we consider the linear model, where each observa¬ 
tion takes the form 


Y = {X,p) + Z, (55) 

where Z ~ for some cr > 0. 

Without loss of generality, we consider the fixed support 
set s = fc}. Following the setup of p7), we let 


/3s be a uniformly random permutation of a fixed vector 
( 6 i,..., 6 fc), and we choose Px ~ A^(0,1). Since both 
the measurement matrices and the noise are Gaussian, the 
mutual information in ( |2^ is given by | [39l Ch. 10] 

Issius^^ibs) = ^ log (l -f ^ Y, ■ (56) 

Sdif 

Throughout this subsection, we denote ;= min^ \bi\ 
and 5niax := max^ | 6 i|. We assume that cr^ = 0(1), and that 
^min — 0(^max) and 0 < ^min — ^(1)’ UOte that ^niin — 
o(l) is allowed. The steps of Procedure are as follows. 

Step 1: We trivially choose the typical set 7/j to contain 
all vectors on the support of Pps- 

Step 2: We make use of the following concentration 
inequality based on Bernstein’s inequality. 

Proposition 1. Under the preceding setup for the linear 
model, we have for all (sdif,Seq) and bs that 


(X,,,;Y|X, 


,bs)-nls,,,,,s,^{bs)\ > nS 


13s = bs 


where 


< 2 exp — 


2 as 


(cr-f cr^dif) 


(57) 

(58) 


aY ■■= bl 

Proof: See Appendix [B| ■ 

Setting 6 = ( 52 /sdif.Seq((>«), it follows that in ( [T5] l and ( |4T] ) 
we can set 


'/’tin, 62 ) = 'f'lin, 62 ) = 2 max 

(sdif.Seq.bs):|Sdit|=^ 
iS2lssii,s,^ibs)fn 


exp - 


2(4Q;ddif T ‘^27sdif .Seq 


(59) 


Step 3: In accordance with the first item of Remark 
we set 7 as in ( |26| ) so that To (7) = 0. 

We focus on the conditions on n under which the term 
eLi 0*1 (n, 62 ) in ( [39] t vanishes; the term containing 
in (|4g can be handled in a similar yet simpler fashion. 
By the assumptions = 0(1) and &i„ax = 0(^min). we 
readily obtain /sd^f ,seq((>s) = 0 (log(l-f f 6 ^i„)) and = 
0(min{l,f6^;jj}) using ( |56l l and ( |58l l, where £ = |sdif|- 
Using these growth rates and upper bounding the summation 
in p9l l by k times the corresponding maximum, we see that 
n,S 2 ) 0 provided that the following holds 

for some sufficiently small constant ( (depending on ^ 2 ); 


_ nlog^(l + fb^i„) _^ 

min{l,f&^iJ + log(l + £bl,J^vnm{l,£bYJ 

k 

— £ log - — log /c —>■ 00 (60) 

for all £. We now treat two cases separately; 

• If = 0 ( 1 ), the first term in ( |60l l behaves as 

0 (nf 6 ^;jj); by rearranging, we conclude that it suffices 
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that n —>■ (X) and n = with a sufficiently large 

1- 1 

implied constant. 

• If the first term in ( |60| ) behaves as fl(n), 

and it thus suffices that n —>■ oo and n = n{k) with a 
sufficiently large implied constant. 

Thus, the overall condition that we require is n —oo and 

n = and n = n{k), (61) 

with sufficiently large implied constants. For the converse, 
the analogous condition to ( |60l l contains only the first term 
on the left-hand side (the difference being due to the fact that 
the combinatorial term in ( |39l l is not present in (|45|)), and a 
similar argument reveals that it suffices that n = 

Step 4: Combining the preceding steps and ap^ying 
asymptotic simplifications, we obtain the following. 


Corollary 1. Under the preceding setup for the linear model 
with cr^ = 0(1), 6min = 0(Vax), &min = ^(l), k = o(p), 
and mp distinct elements in (&i,... ,bk), we have Pe ^ 0 
as p ^ oo provided that 


n > max 


log(^-'= 


I Sdif I ' 


Sdif #0 i 
2 


5 log (i + ^ E 


iGSdif i 


hi 




(62) 


under any one of the following additional conditions: (i) 
k = 0 ( 1 ); (ii) k = o(logp) and mp = 0 ( 1 ); (Hi) k = 
0 ((logp)®) for some 0 > 0 , and mp = 1 ; (iv) k = 0 (p®) 
for some 0 € ( 0 , 1 ), = 0 (^^^), and mp = 1 . 

Conversely, without any additional conditions, we have 
Pe ^ 1 as p ^ oo whenever 

log (P-f+l*^tl) 

n < max - -i -( 53 ) 

for some p > 0. 

Proof: The converse part follows from ( |4^ with (5i —0 
sufficiently slowly. To check the condition n = 
stated following we may assume without loss of 

generality that ( |63] l holds with equality, since the decoder 
can always choose to ignore additional measurements. When 
equality holds, we observe that for the worst-case suit with 
^ = 1, the denominator therein behaves as 0 ( 6 ^;^^) (since 
^min = 61(1)) and the numerator behaves as 0 (logp), and 
hence, the condition n = a;(p^) is satisfied. 

For the achievability part, we first use ([3^ to obtain 

pog(i + ;,E.,„,i>l) 

(64) 

where the final term in the numerator arises from ( |26l l since 
Pps (('s) 1 ® '^ 6 e same for all permutations of ( 61 ,..., hf}, and 
is lower bounded by . Observe that the first term in the 
numerator behaves as 0 (|sdif| logp) for each of the cases 
in the corollary statement, and the second term behaves as 
0 (logfc-f |sdif|log 1 ^). 


In cases (i)-(iii), we have log A: = o(logp), and it 
immediately follows that the numerator in ( |64l l is dominated 
by the first term, and hence, the others can be factored into 
p in ( |62| l. Moreover, in case (i), both conditions in ( |M] l are 
dominated by the objective in ( |64l l with I := |sdif| = 1, 
which behaves as 0(|^^). In cases (ii)-(iii), the first 
condition in ( | 6 T] ) is again dominated by the term in ( |64| ) 
with i = 1. The second condition is dominated by the term 
with t = k, which behaves as 0 ( i„g(i+fcblj ) = ^(^ii|i)- 

In case (iv), the first term in the numerator of ^64| ) 
may not be dominant for small I := |sdif|, since logfc = 
0(logp). However, by observing that the objective scales as 
0 ( ioga+fb^T ) ) assumed scaling of it is 

readily verified that the maximum can only be achieved with 
(. = 0(fc). For any such maximizer, we have log (^7^) = 
0 (A;logp), and hence, the second term in the numerator of 
can be factored into 77 , as it behaves as 0{k). The two 
conditions in ( |M] | are identical under the given scaling of 
^min’ dominated by the objective in ( |62] i with £ = k, 

which behaves as 0 ( °pg\. ) ■ ■ 

In the case that 6 i„in = ©(1)^ the thresholds given in 
Corollary coincide with those given in the main results of 
GZl- Our framework has the advantage of handling the case 
that bjnin = o(l)j as well as providing the strong converse 
(Pe 1) instead of the weak converse (Pe 7 ^ 0). However, 
it should be noted that the achievability parts of | |T7| have the 
notable advantage of using a decoder that does not depend 
on the distribution of /3s. 

On first glance, the bounds in (|62|)-(|6^ may appear to be 
difficult to evaluate, since the maximizations are over 2 ^ — 1 
non-empty subsets Sdif. However, it is in fact only k of them 
that need to be computed, since for any given £ = \ suit | the 
maximizing suit is the one with the smallest corresponding 
value of V _ bf 

Comparison to the LASSO: Conditions for the support 
recovery of the computationally tractable LASSO algorithm 
were given by Wainwright Q. Several comparisons to the 
information-theoretic limits were given in 0 , § in terms 
of scaling laws; here we complement these comparisons by 
briefly discussing the corresponding constant factors. For 
simplicity, we focus on the case that the non-zero entries are 
all equal to a common value bg = '^ (for some constant cp 
representing the per-sample SNR) and k is poly-logarithmic 
in p, corresponding to case (iii) of Corollary 

The results of 0 state that LASSO requires at least 
( 2 fclogp)(l-|-o(l)) measurements regardless of cp, and that 
this bound is also achievable in the limit as cp 00 . On the 
other hand. Corollary reveals that for the optimal decoder, 
the coefficient to klogp can be arbitrarily small provided 
that Cp is large enough. More precisely, applying some 
simple manipulations to ( |62] l, we find that the coefficient to 
k logp is sup„g(o 1 ] 1 where a represents the ra¬ 

tio It is easy to verify that the maximum is achieved at 
a = 1, yielding the constant We conclude that the 

LASSO provably yields a suboptimal constant when cp > 1, 
and fails to achieve the optimal logarithmic decay. However, 
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it should be noted that our decoder requires knowledge of k 
and cp, whereas the LASSO does not (except possibly via 
their role in determining the regularization parameter). 


B. Linear Model with Gaussian j3s and Partial Recovery 
In this subsection, we consider the setup of Section 


continuous rather than discrete, and we consider partial 
recovery instead of exact recovery. More specifically, we 
let /3s be i.i.d. on N{0,ap) for some variance tr^, and we 
consider the recovery condition in Q with 

dmax = la*k\ (65) 

for some a* G (0,1) not varying with p. We again choose 
Px ~ -^(0,1). We assume ^ for some cp > Q not 

depending on p, corresponding to a fixed per-sample SNR. 
We begin with the following auxiliary result. 


IV-A with two changes; We let the distribution of /3s be 


Proposition 2. Under the preceding setup for the linear 
model, the quantities Iq and Vq defined in (l^-@ satisfy 


/ 0 <jlog(l + ^) 

( 66 ) 

Vo < 2 n. 

(67) 

Proof: See Appendix [B] 

■ 


We now proceed with the steps of Procedure [T] (with the 
suitable changes from exact recovery to partial recovery, 
cf Section [Ilfel l. 

Step 1: Our choice of the typical set Tp is based on 
the following proposition characterizing the behavior of the 
[ak\ entries of /3s having the smallest magnitude for fixed 
a. We define the random variable /3' to be the permutation of 
/3s whose entries are listed in increasing order of magnitude. 


Proposition 3. For any a G (0,1], we have 


lim 

k—^oo 


1 

ka} 


[afcj 




5(a) 


( 68 ) 


with probability one, where 

pOO 

g{a) := / [a — (u)]du, (69) 

Jo 

and F ^2 is the cumulative distribution function of a 
random variable with one degree of freedom. 


Proof: Letting be the empirical distribution of the 

values \. ,, we have from the Glivenko-Cantelli 

t fir* > 

theorem Thm. 19.1] that sup„ \Fk{u) — F^ 2 {u)\ —>■ 0 
almost surely. This immediately implies that the sum of 
the [ak\ smallest values in {^/3?}, normalized by the 
number of values k, converges almost surely to the integral 
of F~ 2 ^{u) from 0 to a. It is easily verified graphically that 
this integral can equivalently be written as ( |69| l. ■ 

Based on this result and its proof, we set Tp to be the set 
of vectors hg such that sup„ \Fk{u)—F ^2 (w)| < e, where e is 
chosen to decay sufficiently slowly so that P[/3j; G Tp] —)• 1. 


Thus, within the typical set, the empirical distribution of the 
non-zero entries closely follows a x^ random variable. 

An important consequence of this choice of typical set 
regards the behavior of the mutual information in ( |5^ . For 
a fixed set size |sdif|, the partition (sdif,Seq) minimizing 
this mutual information is the one with the smallest value 
of t ■ Wi'^hin the typical set, we immediately obtain 

from Proposition that the corresponding mutual informa¬ 
tion behaves as follows when |sdif| = \ak\: 

^ ^ log (l + ^9{o )), (70) 

where we recall that cp = ka^^ is a constant. 

Step 2: We again make use of Proposition and its 
subsequent expression for ifi and in ( |59l l. 

Step 3: We choose 7 = /q + li^ < |29l l for some 

(Jo > 0, thus ensuring that To(7) ^ ^o- 

For the terms in Theorems |5j|^ containing ifg and 
we first note that since we are considering partial recovery, 
we may focus on values of £ = |sdif| greater than a*k. 
By our choice of Tp, we may also focus on realizations hg 
of /3s satisfying ( | 68 ] l. For such realizations, we have for all 
Sdif with |sdif| = £ = 0 (fc) that = ^(^)’ which 

implies that = 0(1) in ^ and /sdif.sBq(l's) = Tl(l) 
in ( |56] l. The analogous condition to ( |60l ) thus simplifies to 
nl' ;g> k for some /' = fl(l), giving the following condition 
under which the second term in ( |39l l vanishes: 

n = 0(fc), (71) 


with a sufficiently large implied constant. For the converse 
part, it suffices to have the weaker condition n = w(l). 
Step 4: Combining the above steps, we get the following. 


Corollary 2. Under the preceding setup for the linear model 
with k —> cx), k = o(j>), ^ for some cp > 0 , and 

<3max = for some a* G (0,1), we have Pe(rfmax) —> 0 

as p ^ 00 provided that 


n > max — 


ak log f 


log (1 -f ^ 5 (a)] 


for some rj > 0, where gf) is defined in 
Te(<3max) -G 1 as p ^ oo whenever 


(1 + 1 ?) (72) 

(|69|l. Conversely, 


n < 


max 
aG [q:* ,1] 


Ja^£)Hog^(l _ ^) 
ilog(l+^p(a)) 


for some rj > 0. 


(73) 


Proof: The condition in is obtained using 
and By the assumption k = o{p), the numerator in 
coincides with log(f)j^^) up to remainder terms in 
Stirling’s approximation that can be factored into 77. The 
factor log(| 2 (|E|) ) ® has been factored into 7 ; 

this is valid when (5i —0 sufficiently slowly due to the 
fact that log (fc(|g*^|)) = 0 {k), whereas (again using the 
assumption k = o{p)) the numerator in ( |7^ behaves as 
u:{k). We claim that the factor 7 = 7o -f resulting 
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from ( |29l l can also be factored into 77 for some vanishing 
sequence of parameters Sg indexed by p. To see this, 
we consider without loss of generality the “worst-case” 
setting in which ( |72| ) holds with equality. We readily obtain 
n = Q{klog |), which in turn implies from Proposition l2| 
that Ig = 0(fclog(l -f fccr| log D) = 0(A:loglog|) and 

'JVg = 0[yJk\og |). Thus, Ig -\- is dominated by the 
numerator of ( |72l l if Jq is chosen to decay as (for example) 
®(logT)- ~ 0(A:log |) also implies 

The converse bound in ( |7^ is obtained similarly using 
except for the term a* in the numerator. To see 
how this arises, we consider an arbitrary value of a € 
(a*,l] and set i = [afcj; the case a = a* follows 
by continuity. The term log is handled in the 

same way as the term log (^7^) ^bove, so we focus on 
the term logX)d=o' (^d^)(d)- i^ upper bounded by 
maxd=o,...,<i„ax log ((1 + dmax)(^d*) (d))- Similarly to the 
achievability part, we can factor log ((1 -I- dmax) log (^)) 
into 77, so we are left with log Approximating this 

using Stirling’s approximation as before, and recalling that 
dmax = we obtain the desired term a* A: log |. ■ 

While the achievability and converse bounds in Corollary 
l^do not have the same constants, the two are similar, and 
always have the same scaling laws. In the limit as cp 
00, we have ^ log (l -f ^gia)) = i(logC/3)(l -f o(l)); in 
this case, the maxima in are both achieved with 

a ^ 1, and hence, the two bounds coincide to within a 
multiplicative factor of 

Corollary [2 is related to the setting studied by Reeves 
and Gastpar |^, fl^ , but considers k = o{p) instead 
of k = 0(p). Despite this difference, it is instructive to 
compare the bounds upon letting the implied constant in 
the 0(p) scaling tend to zero. A careful comparison reveals 
that the converse bounds coincide in this limit, whereas our 
achievability bound is slightly better, in that the analogous 
bound in | |l^ multiplies ^g{a) by (v^— 1)^ ~ 0.17; see 
El Eq. (^] and |[Tg Eq. (25)]. 

In Section |IV-E| we present some numerical results for 
this setting. 


C. 1-bit Model with Discrete j3s 
We now turn to the quantized counterpart of (|55]l: 


F = sign((Xs,/3s)+^)- (74) 


As in Section IV-A| we hx s = {1,..., fc} and let Ps be a 
uniformly random permutation of a hxed vector {bi,... ,bk), 
and we set Px ~ N{0, 1). We again write the minimum and 
maximum absolute values of {bi}^^i as bmin and &i„ax- 
The following proposition gives the required characteriza¬ 
tions on the mutual information terms and the corresponding 
variance terms. Recall the binary entropy function i72( ) and 
the Q-function Q{-) dehned in Section I-C 


Proposition 4. Under the preceding setup for the 1-bit 
model, we have the following: 


(i) The mutual information 7sdif,SBq(^s) A given by 

H 2 (q(w^ 


7sdit,SBq(('s) — ® 


y ■ 


E 


-Gsdif ^ 




(75) 


where W ~ N{0, 1). 

(ii) If k = 0(1), 0-2 = 0(1), &niin = 0(('max), and 
= 0(1), then 

7ddd,.eq(&.) =(^Y. ('?)(!+ 0(1))- (76) 


jGsdif 


(Hi) If k = 0(p), cr^ = 0(1), and the entries of bg all 
equal a common value bg such that 6 q = 0(^^^), then the 
mutual information quantities 7sdif,SBq(^s) with |sdif| = 1 
all equal a common value Ii satisfying 




2 TTk 


6^ 


=E 


VElog 


1-QiW) 


Qiw) 




(l + o(l)) (77) 

(78) 


where W ~ W(0,1). 

(iv) The variance l^sdif,seq(^s) defined in a satisfies 




iGSdif 


iGSdit 






(79) 


+ min ^ 1, 

. z y - . 

for some universal constant cg. 

Proof: See Appendix [C] ■ 

Below we present two corollaries corresponding to differ¬ 
ent scalings of k and the SNR, namely, those given in parts 
(ii) and (iii) of Proposition]^ We proceed by simultaneously 
presenting the steps of Procedure [1] for both settings. 

Step 1: As in Section IV-A we choose the trivial typical 
set Tp containing all vectors on the support of Pps- 

Step 2: We make use of Chebyshev’s inequality in 
Proposition in Appendix 0 Choosing 6 = i527ddit.seq(^s) 
in P43|l, it follows that we may set 


-ipi(n,S2) = fi[{n,52) 

i^eq ) fOf\\ 

= max —q-- —(80) 

(ddif.SBq,;,,) : |ddit|=^ ™^/sdit,Seq(0s)^ 

Step 3: We again choose 7 as in ( |26l l so that To(7) = 0. 
Consider the setting described in part (ii) of Proposition 
Under the scalings therein, and both behave as 
0(6^in). Hence, and using ( |80l l and the fact that k = 0(1), 
the second term in ( |39] l vanishes provided that 

« = w(72~)' (^1 ) 
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The setting described in part (iii) of Propositionj^is handled 
similarly. We set £ = {1} in Theorem|^ thus focusing only 
on ^ := |sdif| = 1- Denoting the corresponding variance 
^sdif.se (bs) by Vi, it follows by substituting the scalings of 
k, and 6 q into ( |79l ) that Vi = It thus follows 

from ( |78l l and ( |80l l that ij}[{n,S 2 ) vanishes provided that 

n = uj{p). (82) 


Step 4: Combining the above steps and applying asymp¬ 
totic simplifications, we obtain the following corollaries. 


Corollary 3. Under the preceding setup for the 1-bit model 

with k = 0(1), = 0(1), 6inin = 0(Vax), and = 

o(l), we have Pg ^ 0 as p ^ oo provided that 

|sdit|logp , , 

n> max ^ + (83) 

for some rj > 0. Conversely, Pe ^ 1 as p ^ oo whenever 

|sdif|logp , . 

(84) 

for some rj > 0 . 


Proof: We obtain ( |83| l and ( |84l l from ([3^ and 
respectively. The denominators are obtained directly from 
part (ii) of Proposition and the numerators follow from 
the identity log(|^^.^|) = (|sdif I logp)(l + o(l)), which 
holds whenever k = 0(1) and hence |sdif| = 0(1)- By 
the assumption k = 0(1), the remaining terms in ( |J7| l 
(including the choice of 7 in @) can be factored into rj. 
The condition in ( [ST] ) is implied by ( |8^ (or by ( |84l l when 
equality holds) by the same argument as Corollary [1] ■ 


Corollary 4. Under the preceding setup for the 1-bit model 
with k = 0(p), cr^ = 0(1). and the entries of /3s 
deterministically equaling a common value Bq such that 
bo — we have Pe ^ 1 provided that 


n < 


\ogp 


=E 


VP log 1=^ 
Q{W) 


-{l-Tl) (85) 


l27vk^ 

= 0(py/logp) 

for some rj S (0,1), where W ^ N(0, 1)- 


( 86 ) 


Proof: The condition in ( |85| ) follows using ( |42l i with 
|sdif I = 1; the numerator behaves as (logp)(l + o(l)), and 
the denominator behaves according to The additional 
condition in ([82ll is satisfied when ([85|l holds with equality. 


In the same way as (|62]l-(|6^, one can compute 
without evaluating all 2^ — 1 objective values; for a given 
value of |sdif |, the maximum is achieved by the set Sdif with 
the smallest value of V _ hf 

The asymptotic identities used in the proof of Corollary 
[^can directly be applied to (|62li-(|63]l with k = 0(1) and 
bmin = 0(1). the resulting expressions are precisely 
those in (|83|)-([84| with 4 replaced by 1. Thus, this is a case 


where there is only a minor loss in the performance due to 
the quantization; the corresponding asymptotic number of 
measurements only increases by a factor of | « 1.57. 

In contrast. Corollary describes a setting where the 
linear model and its 1-bit counterpart lead to significantly 
different requirements on the number of measurements. Un¬ 
der the scaling described therein, the necessary and sufficient 
number of measurements for the linear model behaves as 
0(£)® Table I]. Thus, the 1-bit quantization increases the 
required number of measurements from linear to super-linear 
in the ambient dimension. 


D. 1-bit Model with Gaussian fig and Partial Recovery 

We now consi der th e 1-bit counterpart of the setting 
studied in Section IV-B ; ; .3 Artn 


where /3s is i.i.d. on N[Q,ap) for 
some o'p = and we seek partial recovery as in (0 with 
dmax = We make use of the following. 


Proposition 5. Under the preceding setup for the 1-bit 
model, the quantity /o_+ in P0[) satisfies 



for some universal constant c'q. 


Proof: By the data processing inequality, Jg must 
satisfy ( |66] l even in the 1-bit setting. We immediately obtain 
( |87l ) from the identity Io^+ < Iq -\- a/STq given in 1^ . ■ 

We now turn to the steps for providing a counterpart to 
Corollary]^ We define the function 


T'(a,C;3,cr) := E 


H 2 [Q[ W 



( 88 ) 


where W ^ N(0, 1), and g(a) is defined in 

Step 1: We choose the same typical set Tg as that in 

I holds for all sequences 

b? 


Section IV-B thus ensuring that 
of typical vectors. It follows that 


esdif 


cggia) 

^ '^/3(1 - g{a)) for the pair (sdif,Seq) 
with corresponding sizes {£, k — £) (£ = [a/cj) such that 
X)iGsd f minimized. We observe from CD that mini- 
bf also amounts to minimizing /sjif,se (bs) 


iGsdif 

mizing 
for a fixed value of 


Sdif 


■^dif ?®eq V 

as was the case for the linear 


model. If converges to a given constant a, then the 
corresponding mutual information converges as follows, in 
accordance with (|75]l and 


"Sdif 


(bs) '^{a,cp,a). 


(89) 


Step 2: We make use of the general concentration in¬ 
equality given in Proposition in Appendix setting 
5 = 62 ls^ii,s„,^{bs) in ( |145| l in Appendixgives 

i^e(,n,S2) - ipein, 62 ) 

, ( {S2ls,,„sjbs)?n \ 

(ddd.de ““l«did=« 2{8\y\ + 2621s,,,sjb,))j- 

(90) 
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Step 3: We choose 7 = 


_ -^0,+ 




as 


in ensuring that 


Poi'l) < <5o- The other remainder terms are controlled 


in the same way as Section IV-B We again use the fact 
that the typical realizations bs of /3s satisfy (|68ll, and yield 


= ^(1)’ hence = 0(1) in We also 
have /sdit.seq(^^) = 0(1) in this is seen by noting that 

the smallest mutual information for a hxed |sdif| = [ak\ 
satishes ( |89] l, and the mutual information upper bounded 
by log 2 since the observations are binary. It follows that 
the exponent in ( [90l l behaves as 0(n); hence, following the 
arguments in Section |IV-B| we conclude that the second 
term in ([39ll vanishes provided that 


n = fl{k) 


(91) 


with a sufficiently large implied constant. Once again, for 
the converse part, one can analogously show that the weaker 
condition n = tu)!) suffices. 

Step 4: Combining the above steps, we get the following. 


Corollary 5. Under the preceding setup for the 1-bit model 
with k ^ oo and k = o{p), cr^ = 0(1), cr| = ^ for some 
Cj 3 > 0, and dmax = \a*k\ for some a* G (0,1), we have 
Peid-max) 0 as p ^ oo provided that 


n > max 

aG [q:* ,1] 


ak log I 
'^{a,cp,a) 


(1 + t?) 


(92) 


for some rj > 0, where is defined in ( |88| l. Conversely, 
Peidmayi) 1 as p ^ oo whenever 


n < max 

aG [ck* ,1] 


(a-a*)fclogg 
'k(a,c^,a) ^ 


(93) 


for some rj > 0. 


Proof: As usual, we begin with the conditions in 
and The denominators in (|9^-(|9^ follow di¬ 

rectly by applying ( [89| l. Moreover, the terms ak log | and 
{a—a*)k log ^ in the numerators are obtained in an identical 
fashion to Corollary once we show that there exists a 
vanishing sequence of constants Jq, indexed by p, such that 
the remainder term 7 = ^7 resulting from ( |3T| l can be 
factored into p. To see this, we note that the right-hand side 
of ( |92] i behaves as 0(fclogp), whereas from Proposition 
^ (with the scalings n = Q{k\ogp) and tT| = 0(i))’ 
To, 4. behaves as O(fcloglogp). We may thus set Jg to be 
(for example) Finally, we observe that ( |9T] i holds 

whenever ( |92| ) holcfs, and similarly for the converse part. ■ 
The main difference in (|92ll-(|93]l compared to the linear 
counterparts in (|72ll-(|73]l is the behavior in the limit as C/3 := 
ka'p —00. As stated following Corollary 0 the denominator 
in the linear setting behaves as (logc7(l -f o(l)), thus 
tending towards inhnity. In contrast, for the 1-bit setting, 
we have T' < log2 due to the fact that H 2 {-) G [0,log2], 
and thus the denominator cannot grow unbounded. These 
observations are consistent with Corollary which shows 
that 1-bit CS can require signihcantly more measurements 
compared to the linear setting when the signal-to-noise ratio 
(SNR) is sufficiently high. 



Figure 2: Asymptotic thresholds on the number of measure¬ 
ments required for partial support recovery for the linear and 
1-bit models, with a* = 0.1. The number of measurements 
is normalized by fclog |, and SNRob is dehned in ( |94l i. 


E. Numerical Evaluations for Partial Recovery 

In this subsection, we present numerical calculations for 
the settings considered in Sections |IV-B| and |IV-D| We set 
a* = 0.1, cr^ = 1, and k = o{p). We consider values of afj 
of the form ^ for hxed ci^. Similarly to 

we present our results in terms of 

ka'^ 

SNRdB := 10 log ^ = 10log (94) 

which represents the per-sample SNR in dB. 

Figure l^plots the asymptotic thresholds on the number of 
measurements from Corollaries |2]and|3 For both the linear 
and 1-bit settings, there is a close correspondence between 
the necessary and sufficient number of measurements. The 
bounds for the two models nearly coincide at low SNR, 
which is consistent with Corollary]^ 

The behavior of the bounds at high SNRs is also con¬ 
sistent with our previous discussions. In the linear setting, 
the ratio between the bounds narrows to approximately 
1.11 as the SNR grows large, which coincides with the 
value given in the discussion following Corollary 

Moreover, as discussed following Corollary the number 
of measurements steadily decreases for increasing SNRs for 
the linear model, while saturating at an asymptotic limit for 
the 1-bit model. 



E Group Testing 

1) Noiseless Case with Exact Recovery: Here we con¬ 
sider the noiseless group testing problem, where each ob¬ 
servation is deterministically generated according to 

Y = l[\J{X, = l}}. 
ieS 


( 95 ) 
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We consider Bernoulli measurement matrices with Px{l) = 
1 — Px{0) = f, where v m d. constant not depending on p. 
Here there is no latent variable /3s, which can equivalently be 
thought of as corresponding to /3s equaling the vector of ones 
deterministically. This implies that fsdif,seq(^s) depends only 
on £ = |sdif and we emphasize this by writing it as I^. Our 
setting readily handles both fixed and growing fc; since the 
former is already well-understood Q, pTj , PH , we focus 
our attention on the case that k —> oo, and in particular on 
the case that k = 0(p®) for some 9 G (0,1). 

Proposition 6. Under the noiseless group testing setup, 
consider arbitrary sequences of sparsity levels k ^ oo and 
£ € {1,..., fc}, indexed by p. If ^ = o(l), then 

Ii= (^e~''v^\ogj^{l + o{l)). (96) 

Moreover, if ^ ^ a G {0, 1], then 

h = (e-“") (1 + o(l)). (97) 

Proof: See Appendix [P] ■ 

We proceed with the steps of Procedure [T] 

Step 1: The first step is trivial; /3s is deterministic, and 
thus the typical set 7 ^ is a singleton. 

Step 2: In contrast to the previous examples, we use 
different concentration inequalities to handle different values 
of £. Moreover, in accordance with Remark we let 
depend on £, writing it as i 52 ,^. For “large” values of i (to 
be made precise below), we will apply the general bound in 
Proposition [T0| in Appendix [A| For “small” values of I, we 
use the following to obtain an improved bound. 

Proposition 7. For the noiseless group testing problem, 
consider sequences k ^ oo and £, indexed by p, such that 
I — > 0. For any e > 0 and 82,1 G (0, 1) bounded away from 
zero and one, the following holds for sufficiently large p: 


*"(Xs,,;Y|Xs 


< exp ( '^ly 


,bs) < nli{l - 62,e) 

1 —^ 2 ,^) log(l—i52,r)+52,^^ ( 1 — 


(98) 


For the converse, we only use the latter of these two cases, 

setting 'f'lin, 62,1) = 2 exp ( - 2(16+262.Ue) )' 

Step 3: Since fjg is deterministic, we may trivially set 
7 = 0 to obtain To( 7 ) = 0 in ( |25] l- 

For the converse, we set L = {kf m Theorem plFrom 
the above choice of and the growth of in ^7| i, we 
immediately obtain that 0 whenever n = w(l). The 

achievability part requires more effort; we summarize the 
findings in the following proposition. 

Proposition 8 . Let k = 0(p®) for some 6 G (0,1). 

(i) For any rj > 0, there exists G (0,1) and a choice 

of e > 0 in ( |99l > such that <^ 2 ^^) 0 

provided that 

j-^fclog f 

n> " (1 + 7)- (101) 

(of) 

(ii) For any Sk G (0,1), we have 

^ 0 provided n = H(fclog^). 

Proof: See Appendix [D] ■ 

The idea here is that for the smaller values of £, it is 
the concentration inequality that dominates the final bound, 
so we let 52/ = <^ 2 ^^ 1*® closer to one to provide better 
concentration behavior. For large values of £, the opposite 
is true, so we let 82,1. = 5^'^ be close to zero. 

Step 4: We obtain the following corollary by combining 
the previous steps and applying asymptotic simplifications. 

Corollary 6 . For the noiseless group testing problem with 
k = 0(p®) (9 G (0,1)) and an optimized parameter v, we 
have Pe ^ 0 as p ^ 00 provided that 

n > inf max < -^--,-;-- \( k log — Vl + ri) 

- ^>0 \ 1/(1-6»)’772(e-^) jV fc/ 

( 102 ) 

for some ry > 0. Conversely, we have Pg ^ 1 as p ^ 00 
whenever 

k log f , 

(103) 

for some rj > 0. 


for all (sdif,Seq) with |sdif| = £■ 

Proof: See Appendix ID ■ 

From the bounds in (|^i and ( |145| l in Appendix 
we may fix e > 0 and choose the following when p is 
sufficiently large: 

.For£<£<Lp|^J: 

exp ^-n^e"'"i/(^(l-(52.^) log(l-(52,^)+(52,^) (1-e)^ • 

(99) 




= 2 exp - 


{ 82 ,tIiYn 

2{lQ + 262yh))' 


( 100 ) 


Proof: We first consider the achievability part. We 
immediately obtain the first term in the maximum in ( | 102 | ) 
from ([Ton, so it remains to derive the second term. We 
start with by substituting 7 = 0 and taking 5i —0 
sufficiently slowly, we obtain 


n > 


max 


log( 7 ^)+ 21 og(fc(^)) 

hif — 82,e) 


( 1 + 0 ( 1 )). 


(104) 


Using ( |96| ) and the asymptotic identity log ^ = 

0 {£ log^ we see that the objective in (|104[) behaves as 


0 


fclogf \ 

1 + log f / 


(105) 


whenever the constants {^ 27 } are bounded away from one. 
This behaves as 0(fclog|) when ^ = 0(1), and as 
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+A:) when | = o(l) (the latter of these is seen by 

writing log | = log | +log j). Thus, the maximum in ( |104| i 
can only be achieved by a sequence such that | = 0(1). 
Moreover, with | = 0(1)> we see from the assumption 
k = o(p) that the term 2 log (fc(*)) = 0 {k) is dominated 
by log = 0(A:log|), and can thus be factored into 

the o(l) remainder term in ( |104| ). This yields the condition 

£ loff ^ 

n> max — --—-(l + o(l)). (106) 

h{l-52,t)^ ' 


Since the maximum can only be achieved asymptotically if 
I = 0(1), we proceed by considering ^ a for some 
arbitrary a G (0,1]. Under this scaling, £logj behaves as 
(afclog |)(l + o(l)). Moreover, according to Proposition 
we can choose S 2 / to be arbitrarily small for all £ values 
except those below Such values behave as o(fc), and 

thus do achieve the maximum in ( |106| ). Combining these 
observations with ( |97| i, the right-hand side of ( |106| l yields 
the condition 


n > 


max 


ak log 


aG(o,i] 



(1 + ??), 


(107) 


where rj may be arbitrarily small. By a change of vari¬ 
able A = the coefficient to k log | can be written 

as 7 e'^ easily verified to be decreasing in 

A S [0,1], which implies that the maximizing value of a is 
one, and yields the second term in ( |102| i. 

The converse part is similar but considerably simpler; 
by setting C = {k} in Theorem we obtain a = 1 
immediately. The denominator log 2 in ( |103| l is obtained by 
maximizing H 2 {e~'') over i', and the condition n —> 00 
stated before Proposition is clearly satished when ( |103| l 
holds with equality. ■ 

By setting iz = log 2 in ( |102| i, it is readily verified that the 
necessary and sufficient conditions coincide for 0 < 1, and 
in fact yield the same threshold as adaptive group testing 
0 . To our knowledge, this was only known previously in 
the limit as 0 —0 p2) . Further comparisons to previous 
works are provided at the end of this subsection. 

2) Noisy Case with Exact Recovery: We now turn to the 
noisy counterpart of ( |95] i: 

y = i| |J{x, = i}|0Z, (108) 

iGS ^ 


where Z G {0,1} is additive noise, and © denotes modulo- 
2 addition. For concreteness, we focus on the case that 
Z ~ Bernoulli(p) for some p G (0, 7) not varying with 
p, though other noise models also fall into our framework 
(e.g., see Q). As discussed below, we do not attempt to 
provide results with constants that are optimized to the same 
extent as the noiseless case, and we thus set iv = log 2, i.e., 
Px -- Bernoulli(^). 

We follow Procedure[T]in a similar fashion to the noiseless 
case, altering the statements of Proposition [6]-[^ accordingly. 
To avoid repetition, we give the modihed propositions and 
their proofs in Appendix and state the resulting corollary 


here. The main difference is that in the analog of Propo¬ 
sition we let (52^^ remain arbitrary, thus leading to the 
optimization parameter <52 in the following. 

Corollary 7. Under the preceding setup for the noisy group 
testing problem with p G (0,0.5), v — log 2, and k = 0(p®) 
(9 G (0,1)), we have Pg ^ 0 as p ^ 00 provided that 


n> inf max < C(p, <^ 2 ; ^), 
^ 26 ( 0 . 1 ) 


for some rj > 0 , where 


1 


log 2 - 772 (p) 
X (1 


{k\og 
V) (109) 


ap,S2,0) := 


2(1+152(1-2p))^ 

log2“^^l 6Ul-2p)^ 

1+49 

__i-e__ 

(l_2p)logl^(l-52) 
'Conversely, we have Pe ^ 1 as p ^ oo whenever 


n < 


log2-772(p)^ 


( 110 ) 


( 111 ) 


for some rj > 0 . 

Proof: See Appendix]^ ■ 

As we will see in the numerical examples below. Corol¬ 
lary [^provides an exact asymptotic threshold for a narrower 
range of 9 values compared to the noiseless case. This is due 
to the difficulty in precisely characterizing the concentra¬ 
tion behavior of the information density tail probabilities. 
Nevertheless, the second term in the maximum in ( |109| ) 
is always dominant for sufficiently small 9, thus matching 
the converse. To see this, we hrst note that the hrst term 
in the maximum in ( |110| l tends to zero as 6* —>^ 0, and 
cannot be dominant in this limit. This implies that 82 may be 
arbitrarily close to zero provided that 9 is sufficiently small. 
Assuming then that 82 and 9 are small and the maximum 
in ( |110[ ) is achieved by the second term, we can write 
C(p, 82 , 9 ) Ki (i- 2 p)Lg Strictly smaller than 

14 in Appendix 


log 2 -^ 2 (p) ’ Proposition 
3) Partial Recovery: The 
ery (cf (0) in fact leads to simpler expressions and proofs, 
as seen in the following. 


consideration ofpartial recov- 


Corollary 8. Under the preceding setup for the group 
testing problem with p G [0,0.5) (i.e., possibly noiseless), 
V = log 2, k —>■ 00 , k = o{p), and dmax = faf some 

a* G (0,1), we have Pe(<7max) -G 0 as p ^ 00 provided 


n > 


k\og ^ 

log 2 - 772 (p) 


(1 + 77 ) 


( 112 ) 


for some p > 0. Conversely, Peid^iax) “>■ I as p ^ 00 
whenever 


n < 


(l-a*)(fclogg) 
log2 - 772(p) ^ 


(113) 


for some p > 0. 
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Figure 3: Asymptotic thresholds on the number of measure- Figure 4; Asymptotic thresholds on the number of measure¬ 
ments required for noiseless group testing. ^gnts required for noisy group testing. 


Proof: The achievability part follows the proofs of 
Corollaries]^ and 1^ except that the “small” values of i need 
not be handled. That is, we only make use of the general 
concentration inequality in ( |145[ l in Appendix |A| and we end 
up with the single condition in ( |1 12) . For the converse part, 
we again choose C = {k} in Theorem]^ and the steps are 
again similar, with the multiplicative factor 1 — a* arising 
via identical reasoning to Corollary ■ 

Corollary shows that at least for sufficiently small 9 
(e.g., k = 0{p3^ in the noiseless case), there is not much to 
be saved by moving from exact recovery to partial recovery; 
Allowing for a fraction a* of errors leads to at most a 
reduction in the number of measurements of a multiplicative 
factor 1 — a*. 

4) Numerical Evaluations: In Figure]^ we compare the 
bounds in Corollary]^ with existing asymptotic bounds in the 
literature. For convenience, we switch to base-2 logarithms 
and plot the asymptotic limit of the ratio —so that 
a higher value corresponds to fewer measurements. We see 
that our achievability bound improves on all of the existing 
bounds; however, we note that the Combinatorial Optimal 
Matching Pursuit (COMP) fTT) and Definite Defective (DD) 
| |43| algorithms are computationally tractable and do not 
require knowledge of k. 

The converse bound shown is known to hold even for 
adaptive measurement matrices m- Thus, a key implication 
of our results is that adaptivity provides no asymptotic 
gain over non-adaptive Bernoulli measurements when k = 
0{p3). It remains an important open problem to derive 
practical decoding schemes for achieving the bound in the 
non-adaptive setting. 

Figure provides an analogous plot for the noisy case, 
with three different noise levels (i.e., values of p). In each 
case, we obtain an exact threshold for sufficiently small 0, 
albeit over a narrower range than the noiseless case. Once 
again, the converse is known to hold even in the adaptive 
setting and we have thus provided cases where non- 


adaptive Bernoulli measurements yield the same asymptotics 
as optimal adaptive measurements. To our knowledge, this 
has not been shown previously even in the limit as 0 —0. 


G. General Strong Impossibility result for Discrete Obser¬ 
vation Models 

Equation ( |144| ) in Appendix bounds the variance of 
the information density uniformly in terms of the output 
alphabet size for models with discrete observations. Notable 
examples include group testing, the 1-bit model (or more 
generally, quantizations with more than two levels), and 
logistic regression. We obtain the following general strong 
impossibility result (i.e., conditions under which Pg —> 1) 
by combining Proposition in Appendix with a variant 
of Theorem 01 


Corollary 9. If the observations lie in a finite set 3^ C K 
with probability one, then ^ 1 whenever there exist 
vanishing sequences —>■ 0 and Cp —>■ 0 such that 


n > 


max 

(^difi^eq) ■ ddif'^0 


log 


/p-/c-|-|sditK 
V bdifl / 


- log<5i,p 


-^Sdit.Seq (^s) + 



(114) 


for all bg G within a set whose probability under Pp^ 
approaches one. 


Proof: In this application, we do not use Theorem 
directly, but instead follow the arguments leading up to it 
with (|401 i-(|4T]i replaced by 

log + I'Sdifh > n(/,^,f,deq(&d)+<5); (115) 

V IsdifI J 

and 




,((>«)+^) I/3d 



ml? 

6‘^n 


( 116 ) 
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By Proposition in Appendix we have for all 
(sdif, Seqj that p 16| ) holds, so the analogous probability 
to that on the right-hand side of ( |45| ) is dictated only by 
( |115| ). Moreover, the right-hand side of ( |116| l tends to one 
upon setting 6 = for some Cp —> 0. By also setting 

<5i = (5i,p —>■ 0 (so that the analogous additive term to that 
of i5i in ( |45| l vanishes), we see that ( |115| l coincides with 
( fTT4l ). ■ 


When Ps deterministic, this theorem recovers a recent 
result by Tan and Atia p5) , which was proved using com¬ 
binatorial techniques. Our result is in fact slightly stronger 
in the sense that the additive term in the denominator 
only behaves as a;(^), whereas the corresponding term 
in 125 behaves as Thus, in our result, the mutual 

information term remains the dominant one in a wider range 
of settings. 


V. Proofs OF General Bounds 

Here we provide the proofs of Theorems and and then 
give the changes required to obtain the results for partial 
recovery in Section III-C As mentioned previously, the 
proofs bear some resemblance to those of mixed channels 
in channel coding | [29l Sec. 3.3]. However, the analysis 
here is more involved, primarily due to the fact that the 
“codewords” are not independent for different values of 
s € iS, but instead share common columns corresponding to 
the overlapping parts of the support set. See 0^ 0^ GZI 
for further discussions on the differences between support 
recovery and channel coding. 


A. Proof of Theorem 

1) Initial Non-Asymptotic Bound: Recall the definitions 
of the random variables in ©-(in), and the information 
densities in ([T9]l-(|2T|l. We fix the constants yi,... ,jk arbi¬ 
trarily, and consider a decoder that searches for the unique 
set s S iS such that 


*(xsd.f;y|xs,J > 7|sdifi (117) 

for all 2^' — 1 partitions (sdif, Seq) of s with Sdif f 0. An 
error occurs if no such s exists, if multiple exist, or if such 
a set differs from the true value. 

Since the joint distribution of (/3s, Xg, | S' = s) is 
the same for all s in our setup {cf Section |^, and the 
decoder that we have chosen exhibits a similar symmetry, 
we can condition on a fixed and arbitrary value of S, say 
s = {1,..., fc}. By the union bound, the error probability 
is upper bounded by 


P.< 


U {*(Xsd,;Y|Xs,J<7|.dd}] 

(®dif i^eq) 

+ ^ P i(Xs\s; Y|Xsns) > 7|sdif| 

sG5\{s} 


, (118) 


where here and subsequently we let the condition Sdif f 
0 remain implicit. The first term in (|118|l corresponds to 


the true set failing the threshold test, and the second term 
corresponds to some incorrect set s passing the threshold 
test. In the summand of the second term, we have upper 
bounded the probability of an intersection of 2^ — 1 events 
by just one such event, namely, the one corresponding to 
Sdif = s\s and Seq = s n s. 

Using the shorthand £ := |s\s|, we can weaken the second 
probability in (|118]l as follows: 


Y|Xsns) ^ ‘Tb 


^sns,^s\s,y 

f ^Y|X,..,X, (y|Xs\s, Xsns) , 

X 1 log \ - >7^ (119) 


'PY|x,^q(y|xsns) 

xsns,x^\s,y 

X ^Y|x,^.jX_(y|xs\s,Xsns)e"^" 


( 120 ) 

( 121 ) 


where in ( |119| l we used the fact that the output vector de¬ 
pends only on the columns of Xs corresponding to entries of 
s that are also in s, and ( |120| l follows by bounding Pyix^^ 
using the event within the indicator function, and then upper 
bounding the indicator function by one. Substituting dm] ) 
into dTTSj ) gives 


Pe<P 


y {z(Xs,,;Y|Xs,J<74] 

(•Sdif i^eq) 



( 122 ) 


where the combinatorial terms arise from a standard count¬ 
ing argument 0- 

Note that while the bound in \\22\ appears to be simpler 
than that in the theorem statement, it is difficult to directly 
apply it to specific problems, since zjXg^jj; YjXg^ ) is not 
an i.i.d. summation in general. 

2) Completion of the Proof: We fix the constants 
7 j,..., 7 ^ arbitrarily, and apply the following elementary 
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steps with i = |sdifh 


U {*(X,,,;Y|X,„J<7.}] 

P 

(^dif )'Seq) 





(^dif 5®eq) 


< 


U {log 


Pv|x.,,(Y|X.^J 


(^dif i^eq) 


n log 


Py|x,,,(Y|X, 
Py|x.„ (Y|X,,, 


< It 


+ 1 


U log 


(®dif j"®© 


/^y|x,,,/5.(Y|X,„,,/3,) 
Py|x.„,(Y|X,,J 
/V|x,,,/3.(Y|X,,„/3,) 


<it 


> It 


(dll)), and we can thus write the overall term as 
logPY|x,(Y|X,) 


(®dif )^eq) 


(130) 


(123) Using the same steps as those used in ( |123| )-( fT25l l, we can 
upper bound this by 


logPY|X,/ 3 ,(Y|X,,/?,) 


< max {logPY|x_/ 3 .(Y|X^^^,/ 3 s) + 7 ^ + 7 ^ + 7 } 

(®dif i^eq) 


+ ] 


, Uy|x,a(Y|X„/3«) ■ 

Py|x.(Y|X.) 


(131) 


(124) 


for any constant 7 . Reversing the step in ( |130[ ), this can 
equivalently be written as 




U 1 log 


V^dif i^eq j 


P- 


Y|X_,X. 


(Y|X,,„X,^J 


P 


Y|X, 


+ 1 


U 1 log 


(sdit.s 


P 


Y|X,, 


^;3,(Y|X,,,,/3,) 
,(Y|X,,J 


< It 


Py|x.,A.(Y|X«,„/3,) 


> it 


(125) 


where 7 ^ = It + it The second term in ( |125| l is upper 
bounded as 


,, ,, Py|x..,(Y|X,J ^ _ 

U 1 log 73-7777?-^ > It 


("Sdif i^eq) 

^ E 

(■^dif 5^eq) 


log 


Py|X,,,;3,(Y|X,^,,^,) 

Uy|x,, (Y|X, 


>it 


Uy|X..,/3.(Y|X,^„/3,) 

= E E ppAb,)pT^''~'\-^sj 

(Sdif.Seq) 6=,X,eq.y 

X TY|x.,,/3,(y|x.,,,6s) 

r Uy|x,, (yKJ , 
x^s log ^- Vt-^ > le 

[ TY|X.„q/3,(y|Xdeq,('s) 

^ E E pmpT^'~'\xsJ 


(^dif^^eq) j^Seq ?y 


= E 

t=i 


o-ll 


(126) 


(127) 


X Py|x,„ (y|Xd,Je (128) 


(129) 


where ( |126| l follows from the union bound, and the remain¬ 
ing steps follow the arguments used in ( |119| l-( [T2T] ). 

We now upper bound the first term in ( |125| l. The numer¬ 
ator in ( |1251 l equals PY|Xa(Y|Xs) for all (sdifAeq) if.. 


U (log 

(^dif i^eq) 

< 7^ + 7^ + 7 


Ty |x._j. j x,,q /3, (YI Xg j, Xs_^^, ) 

Uy|x,,A.(Y|X,„,,/?,) 


Py|x.a(Y|X„^,) 
log — , — > 7 


Py|x,(Y|X,) 


(132) 


Observe that the first logarithm appearing here is precisely 
the information density in (|20li. Moreover, the choices 


It = log 


it = log 





(133) 

(134) 


make ( |129[ l and the second term in P22| l be upper bounded 
by i5i each. Hence, and combining ( 125| l with ( |129| l and 
( |132| l, and recalling that i = |sdif|, we obtain 


B. Proof of Theorem 

As has been done in several previous proofs of 
information-theoretic converse bounds for sparsity pattern 
recovery 0 , 0 , GD, we consider an argument based on a 
genie. As explained formally below, the genie reveals some 
of elements of the support set to the decoder, which is 
left to estimate the remaining entries. An important novelty 
in our arguments is that we also let the revealed indices 
depend on the random non-zero entries of /3; this leads to 
the improvement stated following Theorem]^ 

It will prove convenient to present the proof under the 
following assumption of symmetry. 

Assumption 1. The pair (sdif(fes), Seq(fes)) in Theorem]^ 
satisfies the following property: If is a permutation of 
bs, then the entries of b'^ indexed by Sdif(()s) (respectively, 
Seq(&s)) are a permutation of the entries of bg indexed by 
Sdif(&s) (respectively, Seqibs)). 
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We claim that the theorem statement under this assump¬ 
tion also implies the more general case. To see this, we 
use the symmetry of Py\XsPs respect to S from in 
Section [11] and the fact that X has an i.i.d. distribution. 
Among all the possible choices of functions (sdif(’)) ■Seq(’))’ 
there always exists a pair that maximizes the lower bound 
in ( |33| ) and satisfies Assumption More precisely, for any 
realization of fig, the probability in ( [3^ is determined by 
entries appearing in the partition , /3so ) but not by 
their order, so one can always maximize ( |33l l by forming 
this partition in a manner which is symmetric with respect 
to permutations of Ps- 

We now formally define the genie-aided setup as follows: 

1. Generate a random fc-dimensional vector /3' ~ 

2. Given /3', let /3j;f and be the subvectors indexed 
by Sdif(/30 Seq(/3') respectively. 

3. Let /3dif (respectively, /?eq) be a uniformly random 
permutation of /3j;f (respectively, /3'q). 

4. Generate 5'eq uniformly on 5eq(f), defined to contain 
the subsets of { 1 ,... ,p} having cardinality k—l, 

where f = |3^dif|- Set = 3eq- 

5. Generate S'dif uniformly on 5dif(S'eq), defined to con¬ 
tain the subsets of { 1 ,... ,p}\S'eq having 

cardinality £. Set 

6 . Set S = S'dif U Seq and /Sgc = 0. The measurement 
matrix X is i.i.d. on Px, and the observation vector Y 
is generated from S, X, and (3 according to (|^, as in 
the original problem setup. 

7. Reveal the indices Seq and the vectors /3dif and /3eq 
to the decoder. The decoder forms an estimate Sdif of 
Sdif, and an eiTor occurs if Sdif ^ Sdif. 

The joint distribution of S and /3 is the same here as 
in the original setup: The support set is uniform on the 
(^) elements of S, and the distribution of the non-zero 
entries /3s is that of a uniformly random permutation of 
/3' ^ Pps- Since Pp^ is permutation-invariant by assump¬ 
tion, this yields /3s ~ Pp^, as required. Thus, the only 
difference in this modified setup is that the decoder has 
further information, and it follows that any converse for this 
setup implies the same converse for the original setup. 

Throughout the proof, we make use of the random vari¬ 
ables defined in the preceding steps, departing from the 
notation implicitly conditioned on S equaling a fixed value 
s (see ( flTI ) until the final step in obtaining ( [3^ . 

We first study the error probability for the genie-aided 
setting conditioned^on JS'eq,/3dif,/3eq) = (Seq, &dit, (>eq), 
denoted by Pe(seq, bdif, &eq)- By the identity P[^] = n 
£]+f‘[A{3£% we have for any event ,A(seq, fedit, &eq) that 

Pei^e<i: (^dif 5 ^eq) ^ ^P[>2^(5eq5 ^dif; ^eq)] 

- P[^(seq, &dif,^eq) P no eiTor]. (135) 

We fix the constant 7 ^ and choose 

,4(seq,&dit,&eq) = {t"(Xs,,,; Y|X,,,, 6 ) < (136) 


where £ = k — |seq|, and b := b{bdif,beq, Sdn, Sgq) equals 
&dif (respectively, beq) on the entries indexed by Sdif (respec¬ 
tively, Seq)- Using the definitions in (|20li-(|2T]i, and defining 
L’(sdif |seq, &dif, (>eq) to be the set of pairs (x, y) such that 
the decoder outputs Sdif given (seq, 6dif, &eq, x, y), we obtain 


P[yf(seq, 6dif, beq) H no eiTor] 


= E 


rn 


E 




SditeSdif(*oq) \ r / (x,y)GX>(sdif|s0q,bdit,boq) 


P. 


X 1 < log ■ 


FIX, X, /30 (y l^ddit: , b) 


P< 


'y\X^ Ps (y|^«eqj 


< It 


(137) 


< 


\ I / Sdif GSdif (Seq) (x.y)GX>(Sdif|Seq,bdif,beq) 

x^F|x,,,/3,(y|xs0q,&)e'^^ (138) 

pit 

(139) 




where ( pJTl ) follows since an error occurs if and only 
if (x,y) ^ T>(sdif|Seq,^dif,6eq), P38| ) follows by upper 
bounding Py\x p using the event in the indicator func¬ 
tion, and ( |139| l follows since the sets 27(sdif |seq, 6dif, &eq) 
are disjoint, and their union over Sdif is the entire space of 
(x, y) pairs. 

Averaging ( pGS) ) over (S'eq,/3',/3dif,/3eq) and applying 
( [T 39 I 1 , we obtain 


Pe>^Pps{b') P[(/9dif,/3eq) = (fodif,(>eq)|6'] 

b' ^difj^eq 

^ E E / p \ tp-k+e\ 

SeqGSeq(^)sditGSdif(Seq) I i ) 


X P 


^ (^Sdif 7 I ^Seq j b) E I ^dif 7 ^eq 7 (^dif 7 be 


pit 


^p-k+£j I ’ 


(140) 


where I = |&dif|j and the conditioning on h' is a shorthand 
for /3' = 6', and similarly for the second probability. Finally, 
we claim that this recovers ( |33| ) upon setting 

7 £ = log +log(5i. (141) 


To see this, we first note that all of the terms in the 
summations over Sdif and Seq in ( |140| i are equal, since in 
the probability appearing in the summand, the entries bg^^i 
and are the same for any such pair, namely, bg^.^ = &dif 
and bg^^ = beq (recall also the symmetry of Py\XsPs "'^h 
respect to S assumed in Section |n|. Due to Assumption 
this probability also coincides with th^ in ( [33] l with bg := b', 
regardless of the realization of (/3dif, /3eq) given /3'; the only 
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randomness in the corresponding distribution is that of the 
two random permutations of the subvectors. 

C. Extensions to Partial Recovery 

The achievability analysis in Section |V-A| extends im¬ 
mediately to handle the partial recovery criterion in 0. 
since we have already split the error events according to the 
amount of overlap between the true support and the incorrect 
support. The only difference is that the decoder searches for 
a set s such that ( |117| i holds whenever |sdif| > <^max (as 
opposed to Sdif ^ 0), and chooses one arbitrarily if multiple 
such s exist. It follows that Theorem [T] remains true when 
the union in ( |24| ) is restricted to |sdif | G {dmax + 1,..., A:}. 

The extension of the converse analysis in Section [V-B| is 
less immediate, but still straightforward. We hrst recall the 
observation from that the performance metric in Q 
allows us to focus without loss of generality on decoders 
such that the estimated support S (or ^difUS'eq in the genie- 
aided setting) has cardinality k almost surely. For any such 
decoder, the dehnition in 0 is unchanged when the second 
term in the union is removed. 

We restrict the partitions (sdif (&s), Seq(&s)) of s to satisfy 
\sMbs)\ > <ax. In ([T37)-TO, we change the dehnition 
of I?(sdif |seq, ^dif , &eq) to be the Set of pairs (x, y) such that 
the decoder outputs a sequence Sdif such that |sdif\sdif| < 
dmax- This means that the sets I?(-|seqj ('dif, (>eq) are no 
longer disjoint. However, we can easily count the num¬ 
ber of such sets that each (x, y) pair falls into. For 
hxed (seqjSdif) and d S {0, ..., dmax}, the number of 
sets Sdif C {1,... ,p}\seq such that |sdif\sdif| = d is 
r/) ('“/')■ each (x, y) p^air in- 

eluded in of *e sets X>(-|seq, &dif, (>eq), 

and P39| l is replaced by 

P[y4(seq, 6dit, bec^) H no error] < — 

\ i ) 

(142) 

Thus, Theorem remains true when the pair 
(sdif(-)i ■Seq(-)) is constrained to satisfy |sdif| G 
{rfmax + 1,...,A:}, and is replaced by 

/p-fe-|-|Sdif|\ _ rp-fcW|Sdifh 

I |sdit| 2 ^d=0 \ d )\ d )■ 

VI. Conclusion 

Taking an approach motivated by thresholding techniques 
in channel coding, we have presented a framework for de¬ 
veloping necessary and sufficient conditions on the number 
of measurements for exact and partial support recovery with 
probabilistic models. We have provided several new results 
for the linear, 1-bit, and group testing models, as well as 
general discrete observation models. In several cases, we 
have provided exact asymptotic thresholds on the number 
of measurements with strong converse results. 

There are several possible directions for future research. 
While we have focused on i.i.d. measurement matrices, 
it would be of signiheant interest to consider other types 


of random matrices, and to present converse results that 
hold for arbitrary measurement matrices, subject to suitable 
constraints such as power constraints. We provided some 
work in these directions for specihe models in | |44| , | |45| . 

One could also attempt to move from standard sparsity 
models to structured sparsity models ph) , and from proba¬ 
bilistic guarantees with random to minimax guarantees. 
There are several additional non-linear models that our 
general results could be applied to, such as the Poisson 
and gamma models. Finally, it may be interesting to apply 
similar analysis techniques to other statistical problems 
beyond support recovery. 


Appendix A 

Concentration Inequalities 


In order to apply our general bounds to specihe models, 
we use concentration inequalities to obtain expressions for 
ip£ and tjj'fi in (|^i and ( |4T] ), seeking to make the corre¬ 
sponding terms in ( ^ and (|45|l vanish. Here we present two 


general inequalities that will be used throughout Section IV 


Proposition 9. For general observation models, we have for 
all (sdif, Seq, &s) cind (5 > 0 that 


^(X,,,;Y|X. (6,)| >n5 


= bs 


^ Tsdif ,Seq jbs) 

~ S^n 


(143) 


where A defined in •a. Moreover, if the 

observations lie in a finite set V C K with probability one, 
then the following holds for all (sdif, Seq, ('s) <dnd 6 > 0.' 


Kd..«eq(6.)< |3^l(^)'. (144) 


Before providing the proof, we state the following gener¬ 
alization of (|144|l to higher-order moments. 


Proposition 10. If the observations lie in a finite set 
V C K with probability one, then the following holds for 
all (sdif j Seqj (^s) <dnd 5 > 0.' 


(X, 


Sdif ) Y|Xs^q, 6s) ^fsdit,Seq(^d) I — 
< 2 exp ( — 


2(8|V| +26) 


/3s =6, 

. (145) 


In the remainder of this appendix, we prove these propo¬ 
sitions. Equation ( |143| l follows from Chebyshev’s inequality, 
so we focus our attention on ( |144| )-( [l45] l. We make use of 
the following form of Bernstein’s inequality |38 Sec. 2.8]. 


Lemma 1. Let Wi ,..., Wn be independent real-valued 
random variables such that 


n 

YE[Wi^]<T (146) 

n I 

< |rc«-2 (g>3) (147) 

2 = 1 
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for some r, c > 0. Then 

- n 

^ {w, - £[1^,]) > t 


< exp 




2(r + ct) 


(148) 


i=l 

for all t > 0. 

To bound the moments of i, we follow the arguments of 
129 Rmk. 3.1.1] and |47 App. D]. Recall the definition of 
the information density in ( |2T| . For any g > 2, we have 
from Minkowski’s inequality that 




1 I /9 


< E 


Py\X X 


(log 


Py\x, 

-f E 

[(log- 


11/9 


ni/? 


PY\X,^„l3AX\Xs,^,bs) ^ 


, (149) 


where here and subsequently we implicitly condition on 
fis =bs. For any given the remaining averaging 

over Y in the first term has the form 


y 

X (log-p- - f- (150) 

^ PY\X,^.^X,^^f3Ay\Xs^,t,Xs,^,bs}y 

and is thus upper bounded by since the function 

f{z) = zlog'^ 4 has a maximum value of for 2 : G 

[0,1]. Handling the second term in ( |149] l similarly, we obtain 

< 2(|:V|(^)y^', (151) 

or equivalently 

< (^)%|3;|2«-2 (152) 

< ^8|3^|2«-2, (153) 

where ( |153| ) follows since nr <<!'■■ 

We obtain ( |144| l by setting q = 2 in ( |152| l. Furthermore, 
we obtain Proposition using Lemma with c = 2, r = 
n ■ 8|3^|, and t = Sn. 


Appendix B 

Proofs of Auxiliary Results for the Linear 
Model 

A. Proof of Proposition 

We again use Lemma [T] and we thus seek suitable 
values for t and c. Throughout the proof, we consider the 
random variables (ATs^.^,L) distributed according to 
implicitly conditioning on /3s = bg. From ( |55l l, we have 
Z = Y — X^iGs Xibi, and a direct calculation gives 

PY\x.^^^x.^^pAY\Xg,,,Xg^^,bg) = f{Z;0,a^) (154) 


PY\x,^^l3,{Y\Xs^^,bs) 

= <))( ^ + ^ (155) 

where is the N{p,,a-'^) density function. Substi¬ 

tuting these into ([ST) gives 


*(^dditi “ -^Sdif,Seq(^s) ^^2 


. 




where ^Sdif,Seq (6s) is given in ( |56] l. 

The mean of ( |156| ) is Isdi!,s^ci(.bs), and we will apply 
Lemma [T] with Wi corresponding to the sum of the second 
and third terms on the right-hand side. We can write these 
in terms of independent N{0, 1) random variables (denoted 
by Z\ and Zf) as follows: 


72 

W = -^ 

2 


2(c 


(aZi 


.Zo 


cr: 




(Zi - z!) + 


-Z 1 Z 2 , 


(157) 

(158) 


where we have used the definitions in the proposition 
statement, and ( [T58 ] i follows from simple manipulations. 
Defining Z^ax = niax{|Zi|, IZ 2 I}, we have the following 
with probability one: 


|1L| < 


2^mav + 




r7l 


_ '^Sdif (o' + O'sj,,.) ^2 

9 I 9 '^max' 

^ + <.f 

Since E[Z4 < ElZt + 4"] = 6, we obtain 


E[1L2] < 6 


O'ddif (o’ + O’sdif) 


(159) 

(160) 


(161) 


Similarly, we can bound the higher moments as follows: 



(162) 

(163) 

(164) 


where ( |163| ) follows by the same argument as ( |160| ) and the 
fact that the 2g-th moment of an N{0, 1) random variable 
is ^T{qG- 4), and ( |164| ) follows since r( 9 -|- 4) < y^g!. 

Combining ( |161| ) and ( |164] i, we see that the random 
variables Wi = i{xiait',Y^'^'^\xill,bg) - satisfy 

the conditions of Lemma with r = and c = 

(see (|58]l). We thus obtain the desired result from ( |148] l by 
identifying t = 6n. 


































23 


B. Proof of Proposition ^ 

Since Y = Xs/3s + Z, we have 


Jo = /(/3«; Y|X,) = JJ(Y|X,) - JJ(Y|X„/?,) (165) 

= JJ(X,/3« + Z|X,)-JJ(Z). (166) 


From |39 Ch. 9], we have JJ(Z) = ^ log(27re(T^) and 
JJ(X,^7+ Z|x, = X,) = Jlog((2^e)"det(a2l„ + 
(t|xsxJ)), where I„ is the nxn identity matrix. Averaging 
the latter over Xg and substituting these into (|166|l gives 


Jn = 


I„ + ^X,Xj 


= -E 
2 

_ 1 
“ 2 

k 

< - 
- 2 


'Ifc + ^XfX,^ 
^2 


log det 
log det ^ 

k 

^E[log(l+^A.(Xi'X 


(Ja 




(167) 

(168) 

(169) 

(170) 


where (|168|l follows from the identity det (I + AB) = 
det (I + BA), ([^ follows by writing the determinant as a 
product of eigenvalues (denoted by Ai(-)), and ( |170| i follows 
from Jensen’s inequality and the following calculation; 




[^A.(XfX,)] = iE[Tr(XfX,)] =E[XfXi] =n. 


2=1 


(171) 

This concludes the proof of ( | 66 | ). 

We now turn to the bounding of the variance. Again using 
the fact that Y = X^/J^ + Z, we have 


log 


Py|x.a(Y|X„/3«) 




= log 


V|x,(Y|X,) 

^z(Z) 


JV|x.(X./3.. 

1 


Z|X,) 


(172) 




+ ^(X«/3, + Z)^(a2l + a|X,Xf) '(X«/3, + Z 


(173) 


where Pz is the density of Z, and ( |173| l follows by 
a direct substitution of the densities Pz -Y( 0 , cr^I) 
and Py|Xs(-|xs) A( 0 ,cr^I + cr^XsX^), where 0 is 
the zero vector. Observe now that d^Z^Z is a sum of 
n independent random variables with one degree of 
freedom (each having a variance of 2 ), and hence, the 
second term in ( |173| l has a variance of Moreover, by 
writing = (M “2 )^M “5 for the symmetric positive 

definite matrix M = tr^I + cr^XsX^, where (•)“5 denotes 
the positive definite matrix square root of the inverse, we 
find that the final term in ( |173[ l is distributed as a sum of 
variables when conditioned on any value of Xg, and 
hence, the same is true unconditionally. We therefore again 
obtain a variance of and ( |67| ) follows using the identity 
Var[A + P] < Var[A] +Var[P] + 2max{Var[A], Var[P]}. 


Appendix C 

Proofs of Auxiliary Results for the 1-bit Model 


We first write down the relevant probability distributions 
and information densities conditioned on a fixed value bg 
of /3s. Under the model Y = sign( + Z) with 

Xi ^ A(0,1) and Z ~ A(0,cr^), we have 


JV|XAs (1|2^SJ ^s) 



(174) 

(175) 


Similarly, for any partition of s into (sdif, Seq), we can write 

Y = sign(X:*gs3, + EzGsdif 

same steps to conclude that 

/ -I],PS A 

= Q (176) 


The corresponding probabilities for y = 0 are one minus 
these expressions, which amounts to multiplying the argu¬ 
ment to the Q-function by —1. Substitution into ([2l]i gives 


i{xs^,i;y\xs,^,bs 


) = log 




q{ 


-vT., 






(177) 


for y e {- 1 , 1 }. 

Throughout this appendix, we will use the fact that the 
first two derivatives of the function 


f{x) := H 2 {Q{x)) 


(178) 


are given by 

tn ^ 1 l-QW -1 - = 

/ (t) = log , -7=6 


Q{x) 


1 


= -TZe 


1 


27r Q{x){l-Q{x)) 


, 1 — Q(x) X -x" 

Q{x) ' 


(179) 


(180) 


A. Proof of Proposition ^Part (i) 

Recalling that the coefficients Xi (i € s) are i.i.d. on 
A(0,1), we directly obtain from ( |175| l that 


H{Y\X„(3s 


bs)=E 


JJ2 (q( — Xibj 


i^s 



(181) 


(182) 


where W ^ N{0,1). By evaluating H{Y\Xs^^, 13s = bg) 
similarly using ( |176| l and taking the difference between the 
two, we obtain 
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B. Proof of Proposition ^Part (ii) 

We obtain from (fT79ll-(fT80ll that /'(O) = 0 and /"(O) = 
— By performing further differentiations, one can also 
verify that f^^\0) = 0 , and that is uniformly upper 

bounded by f^‘^\0) = We thus obtain via a fourth- 

order Taylor expansion that 


log 2 - -x^ - - H 2 {Q{x)) 

TT OTT^ 

< log2- -x^ + (183) 

TT OTT^ 

for all X G K. Substituting ( |183| l into ( |75] ) and noting that 
the fourth moments of the arguments to H 2 {Q{-)) therein 
decay to zero strictly faster than the second moments (by 
the assumptions on k, 6 „iin and 6 „iax)^ we obtain 


I 


"^dif ?'®eq 


ibs) = 


1 ^ 1 






r 2 * 

i^s 


E 


zGSdif i 


(1 + 0 ( 1 )). 

(184) 


Again using the assumptions on k, ^min and 6 max, we 
observe that the denominator is dominated by the term cr^, 
thus yielding (|76]l. 


C. Proof of Proposition^Part (Hi) 

In this part, we have assumed that the values {bi} take 
a common value bg. Since cr^ = 0(1), we may set = 
1 without loss of generality; the implied constant can be 
factored into bo. In this case, with £ = 1 simplifies to 


=E 




-H2(q{w^, 


(185) 


By the assumptions k = &(p) and &q = it is easily 

verified by a Taylor expansion of the function f(z) = 


/l+z 


as z ^ 0 that = \/fc 6 §(l - ^ + o(&o))- For 

2 , we write tfiis identity as 


convenience. 


=\/^oi^-Cbl), (186) 

where C is a constant depending on p such that C —4. 
Substituting ( |186| l into ( |185| l, we obtain 


/i =E 


H2 


Q(w^/klfo{l-Cbl)) 

-H2(q(w^\ 


(187) 


The next step is to Taylor expand the function /(x) = 
H 2 {Q{x)). For any x and 5 > 0, we have 


f{x-S) = /(x) + 


S 1 - Q{x) 6^ „ 

- 7 ^ log ^ (a^-<5o) 

V27r Q{x) 2 

(188) 


for some Sg G [0,(5], where the middle term follows from 
\\19) . Next, we claim that f" in ( |180| l is bounded as follows: 

|/"(x)| < ^(1 + (189) 

In the case that x > 0, this is seen by applying Q{x) > 
^ 1 ~ ^ lo obtain the first term, 

and applying Q(x) < e~^ (and hence log = 

1 °S (q^ ~ 1 ) — ^^-1 obtain the second term (e.g., see 

| |48l for bounds on the Q-function). The case x < 0 follows 
since ( |180| l is symmetric about zero. 

Substituting ( |188| l in to (|187| i with the identifications x = 
Ws/kbl and 5 = W\/kbi(bl, we can write 

Ti-T2-T3<Ii<Ti+T2 + T3, (190) 


where 
Ti := CblE 


1 - Q{W^/kbo) 




Q(W^/Mo) 


T 2 := iCbiy^E .5^^(l + |l+|E^)e- 


n := m)E 


W^khl 

2 v^ 


(191) 


(192) 


\W\^(kbl)^/^e - 


(193) 

and where for T 2 and T 3 we used the fact that (5o G [0, in 
( |188| l to upper bound the corresponding terms by the value 
at 5o = 0 or i5o = 6. 

We will complete the proof by showing that Ti behaves as 
( |77| ) (with (T^ = 1), and that T 2 and T 3 behave as o( ^*°^^ ). 
Letting ff) denote the standard normal PDF, we have 


Ti = 



1 - Q{wy/^) 


Qiw^/ki^ 

_ ^ 

X e 2 c 

t \ , l-QU) G 1 
flog 


„ 2 f.h 2 

X e 2“^ dw (194) 


-e 2 


,_ -dt 

(195) 


Cbl 


^2'Kkbl 7-00 


1 

e 0 flog - —dt 


Q{t) 


(196) 


C 6 j 




1 'i-i 


xe-^(i+ij)^loglz^dt (197) 


bl 


yj2'Kkb\ 


:E 


VFlog 


1 - Qjw) 
Q{w) 


Q{t) 

(l + o(l)), (198) 


where ( |195|l follows by a change of variable of the form 
t = w^/kd^, ( |196| ) follows from the definition of f, and 
( |198| l follows since and since the integral in ( |197[ ) 

is the average of flog > 0 } over an A^( 0 , (1 + 
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random variable; since fc 6 g —oo, this converges 
to the corresponding average over W ^ A^( 0 , 1 ), which is 
easily verified to be finite. 

The terms T 2 and T 3 are handled similarly to Ti, so we 
only briefly comment on the analysis of T 3 . By the same 
arguments as those leading to (|196|l, we obtain 


2^^2nkbQ J- 00 V2Tr 

The integral is once again 0(1), and thus T 3 = 0 
which decays to zero strictly faster than P98|l. 



D. Proof of Proposition Part (ivj 

We again assume without loss of generality that tr^ = 1. 
Defining Weq := ^dif := 

follows from ( |177| l that 


= log 


Qi-YjWdif + W,^)) 

Q{-YTWeq} 


( 200 ) 


where r := ^ and we implicitly condition on 

/3s = bs- Using ( [T731 )' and the fact that the variance is upper 
bounded by the second moment, we have 


l^dif ,®eq (^S ) 


< E 


Q{ - (lUdit + W'eq)) log 


Q( - (W^dif + W'eq)) 


Q(-rW'eq) 


, Q(l^dif + lUeq) 

+ + lUeg) ( log 


( 201 ) 


= 2E 


Q(lUdif + lUeq) log 


Q(Wdif + lU, 


eq; 


Q{tW, 


eqj 


, ( 202 ) 


where P 02 [ l follows since the distributions of Wdif and Weq 
are symmetric about zero, and the two are independent. 

The function g{x) := —\ogQ{x) is convex, and hence it 
lies above any given tangent vector. This implies that 


\ 9 ix 1 ) - 9{x2)\ < max{|p'(a::i)|, | 5 '(a; 2 )|}|xi - X 2 I 

(203) 

< (l5'(a:i)| + |g'(a:: 2 )|)|a:i - 2 : 21 , (204) 

where g'{x) = is the derivative of g. Writing the 

logarithm of the ratio in ( | 202 | i as a difference of logarithms 
and applying (|204|i, we obtain 


where, overloading the notation from part (iii), we define 


'■— J'J' /dif (tUdif)/eq(tUeq)Q(tUdif “t“ tUeq) 

^ I _ 

\ Q(rddif 


f fiwdii + Weq) 


(|wdif| + (1 - 'r)|ri;eq|) dWdifdWeq 

(206) 


+ Weq) 

3^2 ■= J'J' fdif{wdit)feq{Weq)Q{Wdif “t“ tUeq) 


fjrWeq) 

QirWeq) 


(|Wdif| + (1 - T)\Weq\ydWdi!dw, 


eq 


(207) 


with /dif and /gq denoting the densities of Wdif Wgq. 
The function Q{x){^^) lies between 0 and 2, and hence 
Ti < iE[(|Wdif| + (1 - 'r)|W4q|)^], yielding 

Ti = 0(E[Wdif]2 + (1 - T)^E[Weqf). (208) 


We will further simplify this expression below, but we first 
bound T 2 , which requires more effort. 

We split the integral over in P07| l according to 
whether Iwdifl < ||weq| or IwdifI > ^|weq|; the resulting 
expressions are denoted by Ti i and Ti 2 respectively. In 
each case, we use the following standard bounds on the Q- 
function (e.g., see ||48)): 


H'rWeq) ^ 
QirWeq) ~ 


Qiwdif + Weq) < 


1 + TWeq Weq > 0 

1 Weq < 0 


(209) 




(^dif+^eq)^ 

2 


1 


Wdif + Wgq > 0 

Wdif + Wgq < 0. 

( 210 ) 


To bound Ti i, we note that the condition |wdif | < ^|Weq| 
implies that sign(wdif + Weq) = sign(wgq), and hence only 
two of the four combinations of the cases in ( |209| l-( |2T0| ) 
can occur. When Weq < 0, we can use the second of each 
of these cases to upper bound the integrand in ( |207| i by 
/dit(wdif)/eq(weq)(|wdif| + (1 “ T)|wgq|) . On the Other 
hand, when Weq > 0 we can use the first of each of the 
cases to upper bound the integrand by 


/dif (wdif ) /eq (Wgq) 6 


('^dif+^eq) 


x(l+T|Wgq|) (|Wdif| + (1 - T)|Weq|) . (211) 


Again using the condition |wdif| < ^|wgq|, we find that 




’dif-l-™eq) ^ 1 , • ■ g • • 

e 2 < e s“'eq Since r < I by its dehmtion 

following P 00 [ ), it follows that e~ ^ 2 ^ + ''-Wgq)^ 

is upper bounded by a universal constant, and we are again 

left only with /dit(wdif)/eq(weq)(|wdif | + (I - 'r)|Wgq|) . 

Combining the two cases, we conclude that 


T2,i = 0(E[W^]i] + (1-t)^E[W,I]). 


Kdd.dqq(6d)<2(Ti+r2), 


(205) 


( 212 ) 
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To upper bound r 2 , 2 ^ we upper bound the integrand in 
( |207l ) by 

/dif(Wdif)/eq(Weq)(l +T|Weq|)^(|Wdif| + (1 “ 'r)|Weq|)^ 

(213) 

— /dif (tUdif)/eq(tUeq) (l T 1^ (3|tUdif|) j 

(214) 

where ( |213| l follows by taking the higher of the two cases 
in both P09[ l and ( |210[ i, and ( |214[ i follows since |uidif| > 
4|u>eq| and r € [0,1]. It follows that 

r 2,2 = 0{E[Wlf] + E[W^,f]). (215) 

We now observe that the first two terms in ( [79l l account 
for all of the terms in ( |208[ l, ( |212| l and pi5| ) except for 
(1 — r)^E[Wg^ ]. Recalling that t = ^ ^ -p-, we see 

that (1 —r)^ = 0 ( 1 ) whenever X^iGsd f ~ ^( 1 )’ whereas 
a Taylor expansion yields (1 — r)^ = ©(( t )^) 

whenever X^iesd t “ ^(l)' Combining these cases, we 
obtain the third term in (IZD; recall that = 1 throughout 
this proof. 


Appendix D 

Proofs of Auxiliary Results for Noiseless 
Group Testing 

A. Proof of Proposition 

As stated in |0 Eq. (36)], we have Ic = (l — 
^H 2{{1 — I) where i? 2 (p) is the binary entropy 
function. For k ^ oo and ^ a, we immediately 
obtain ( |97] i using the limits (l — |)^ ^ —)■ 
and (1 — 1 ) —> e along with the continuity of the 

binary entropy function. In the case that | > 0, the 

analogous limits are (l — ^ and (l — = 

1 — ^(1 + o(l)), and we obtain ( |96| l using the fact that 
iT 2 (l — e) = (—eloge)(l + o(l)) as e —?► 0. Note also that 
log = (log f) (1 + o(l)) ®l"oe f ^ oo. 


Let Nq (respectively, Ni) be the random number of mea¬ 
surements such that = 0 and = 0 (respectively, 
Xg^q = 0 and 2fgj.j f 0). For any ei G (0,1), the above 
observations imply the following with probability one when 
p is sufficiently large; 

> A'l (log (1 - ei) + No/-{l - ei) (216) 

>7Vi(log^)(l-ei). (217) 

We also have from ( [96l l that + ei) 

for sufficiently large p. Combining these, we conclude that 

iVi > n\^e-''vUl - 82) 2? > nh{l - 82). 

1 - ei k 

(218) 

By considering the contrapositive statement, we have for any 
£2 > 0 and sufficiently large p that 


*"(Xg,,,;Y|Xg,q,6 g)<n/,(l-< 52 ) 

< P < ne~''v^{l — (52)(1 + £ 2 ) 

rv 


■ (219) 


By the observations at the start of this subsection, we have 
Ni ~ Binomial(n, ( 7 ) with q = + o(l)). We can 

thus further upper bound the right-hand of (|219|l by 


P[iVi< 71(7(1-52(1-63))] (220) 


for any £3 G ( 0 , 1 ) and sufficiently large p\ here we have 
used the fact that (1 — 52)(1 + o(l)) = (1 ~ <^ 2(1 + o(l))j 
since 82 is fixed. It follows from a standard Chernoff-based 
tail bound for Binomial random variables (e.g., see | |49l 
Sec. 4.1]) that 


XXg,,;Y|Xg 


, 6 g) < nh{l - 82 ) 


< g-'ri9((l-'52(l-e3)) log(l-52(l-e3))+<52(l-e3)) 

The proof is concluded by substituting q = 
and noting that £3 may be arbitrarily small. 


. ( 221 ) 
( 1 + 0 ( 1 )) 


B. Proof of Proposition 

We begin by evaluating the information density in ( | 2 T| l; 
for brevity, we write := i{Xs^,i;Y\Xs^^,bs) and := 
7 (Xgjjf; Y|Xgqq, 6 g). Recalling that Px Bernoulli(^), 
£ = o{k), and we are considering the noiseless case, we 
obtain the following: 

1. We have Xg^q f 0 with probability 1—(l — ^ = 

(1 — e“‘^)(l + 0 ( 1 )), and in this case we have = 0. 

2. Given Xg^q = 0, we have Xg^.^. f 0 with probability 

1 — (1 — = ^{1 + 0 ( 1 )), and in this case we have 

p = log 1 -( 1 -^)Z = (logf)(l + o(l)). 

3. Given Xg^q = 0, we have Xg^.^. = 0 with probability 

(1 — ^)^ = 1 -f 0 ( 1 ), and in this case we have = 
log f (H-o(l)). 

The asymptotic identities given here follow from the as¬ 
sumption i = o{k), along with standard Taylor expansions. 


C. Proof of Proposition 

For the first part, we write (!^) 1 ^ 2 ^^) =■ 

Ti -f T 2 , where Ti sums the terms from 1 to [logfcj, and 
T 2 sums the terms from [logfcj -f 1 to LislfcJ- For 
of these, we upper bound the summation by the number of 
terms times the maximum term. 

For Ti, there are at most log A: terms, and we apply ( [99l l, 
with 82 / = ^ 2 ^^ The term (1 —5^^^) log(l — 52^^)-|-52 oan 
be made arbitrarily close to one by choosing 8 ^'^ to be suf¬ 
ficiently close to one. Writing log (^) = (flog j)(l-|-o(l)) 
and performing some simple rearrangements, we obtain the 
following condition for Ti —^ 0 : 


fclogf-f f loglog/c 

n > max 

e e~'^v 


(1 + 7i)) 


( 222 ) 


where rji may be arbitrarily small. Note that log log k arises 
as the logarithm of the number of terms in the summation. 
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We obtain ( |101[ l by noting that this bound is minimized at 
I = \ and writing k\ogk = (j^felog f)(l + o(l)), which 
follows from k = 0 (p®). 

For T 2 , a similar argument yields ( |222| ) with j log k in 
place of j log log fc; this follows by upper bounding the 
number of terms in the summation by k. Since i > \ogk, 
we have |log/c = 0 ( 1 ), and we conclude that T 2 —)■ 0 
provided that ( | 101 | i holds. 

Finally, for the second part of the proposition, we sub¬ 
stitute ( |100| ). By an analogous argument to that leading 
to (| 222 |, along with the scaling laws of If, in 

it is readily verified that it suffices that n 


n ( maxf 


»Ogf 


l + (f logf)2. 


with a sufficiently large implied con¬ 


stant. Using the fact that i > for this part, this reduces 
to n ( jQg ). Thus, any fl{k log k) scaling suffices, and 
the proof is concluded by noting that log/c = 0 (log |). 


Appendix E 
Noisy Group Testing 

Here we provide the relevant details for noisy group 
testing, leading to Corollary [7] We focus our attention on 
the parts that differ from the noiseless case. Throughout 
the appendix, we use the notation qi * 52 := 9 i <?2 + (1 ~ 
qi){l — q 2 )- We work with an arbitrary Bernoulli distribution 
Px Bernoulli(^) to begin, and later substitute v — log 2. 

Before proceeding, we analyze the values taken by the 
information density if := ,hs) (with £ := 

|sdif|) given in ( |2T] i, under the model in ( |108| i: 

1. We have ^ 0 with probability 1 — (l — 
and in this case we have if = 0 . 

2. Given = 0, we have the following, where we 

define ^ := (l - 

• = OnU = 0 with probability (1 —p)^, yielding 
H = log (i-p){+p(i-5); 

• Xgjjj. = 0 n y = 1 with probability p^, yielding 

• 7 ^ Ony = 0 with probability p(l —^), yielding 
P = log (i-p)«+p(i-5)’ 

• -Agjjj 7 ^ 0 n y = 1 with probability (1 — p)(l — ^), 

yielding if = log . 

In the case that £ = o(fc), we can write ^ = 1— ^(l-|-o(l)), 
yielding the following simplifications: 

1. The preceding four probabilities behave as (l — p)(l — 
^(1 + 0 ( 1 ))), p(l-f(l + o(l))),pf (1 + 0 ( 1 )), and 
(l-p)f(l + o(l)). 

2. The corresponding information densities behave as 

(1 + 0 ( 1 )), (1 + 0 ( 1 )), - log i^(l + 

0 ( 1 )) and log 1^(1 + 0 ( 1 )). For example, the first 
of these follows by writing log = 

lo§ i-p-(]~-^ 2 p)^ ’ dividing the numerator and denom¬ 
inator by 1 — p, and Taylor expanding the logarithm. 


A. Analogs of Propositions [ 6 ]-^ 

The analog of Proposition is as follows. 

Proposition 11. Under the noisy group testing setup in 
Section \IV-F\ consider arbitrary sequences of sparsity levels 
fc —>■ 00 and f £ {1,..., A:} (both indexed by p). If ^ = o(l), 
then 

If = - 2p) log (1 + o(l))- (223) 


Moreover, i/ | a £ (0,1], then 

(i /2 (e-“" * * p) - i/ 2 (p)) (1 + o(l)). (224) 


Proof: We obtain ( |223| l by recalling that the mutual 
information is the average of the information density, and 
applying the above-given asymptotic expansions, along with 
1 - ( 1 - 

To prove ( |224| i, we write I(Xs,^,p,Y\Xs^J = 
HiY\XsJ - ii(y|Xg^„Xg,J. The system model ( [TOSl ) 
immediately gives i/(y|Xg^_j,= Il 2 {p). Moreover, 
a direct calculation reveals that HlYlX,, = cc, ) 
equals iT 2 (p) if has an entry equal to one, 

and II 2 (C * p) otherwise, where we again write 
^ := (1 ~ I) ■ The proof is concluded by noting 

that ^ when ^ > a, and by similarly noting that 

p[Xg^^ = 0 ] = (1 - ^ ■ 

As in the noiseless case, we use Proposition to 
characterize ipf for £ > Lpil+J’ ip'f for £ = k. For 
£ < we instead use the following. 


Proposition 12. Under the noisy group testing setup in 
Section IV-F\ consider sequences k —t 00 and £, indexed 
by p, such that ^ > 0. For any e > 0 and (52 > 0 not 

depending on p, the following holds for sufficiently large p: 


*"(Xg,,;Y|Xg^^,6g)<n/,(l-,52) 
^i(l-2p)2 


f (■ -u f 
< exp I — n—e 


2(1 +2(52(1-2p)) 


)(!-«)). 


(225) 


for all (sdif,Seq) with |sdif| = £■ 

Proof: We make use of the asymptotic identities for if 
at the start of this appendix. We first note that by simple 
averaging analogous to that used to obtain ( |223| l, we have 
V := E[z^] = (log^ 2^)(1 + o(l)). Moreover, 

we have if < (log 2 ^)(l + o(l)) with probability one. 
Using the form of Bernstein’s inequality based on Bennet’s 
inequality Sec. 2.7], we have P[ 7 " < n{If — (5)] exp (— 
^ 2 (v+xsm) C’ whsrs M is any almost-sure upper bound on 
If. Setting (5 = 62 lf, substituting ( |223| ) and the preceding 
expressions for v and M, and canceling the common terms 
in the numerator and denominator, we obtain ( |225| ). ■ 

Letting ipf equal the right-hand side of ( |225| ) for £ < 
Lisffcj’ while being the same as in ( | 100 | ) for £ > 
we obtain the following. 
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Proposition 13. Let k = 0(p®) for some 9 G (0,1). 

(i) For any rj > 0 and 62 G (0,1), there exists a choice of 
e > 0 m < [99] > such that 0 provided 


n > 


2(i+i<5,(i-2p))^ 
— 2pY 


(fclog 


|)(l + 77 ). (226) 


(ii) For any 82 G (0,1), we haveY^^i.^\+i S 2 ) -G- 

0 provided that n = Vl{k log 


Proof: The proof is nearly identical to that of Proposi¬ 
tion except that P25| l is used in place of ( |98] l, and 82 is 
kept arbitrary in the first part. ■ 

Note that the choices of ^2 in the two cases above need 
not coincide; see Remark 


B. Remaining Details in the Proof of Corollary^ 

Recall that we have set v = log 2. This yields e~^v = 
and thus the first term in ( | 110 | ) follows from ( |226| l. 
Next, we consider the condition in ([37|i with i = |sdif| < 
Setting 7 = 0 , letting (5i —>■ 0 sufficiently slowly, 
applying Stirling’s approximation, and substituting P23| l, we 
obtain the condition 


n > max 
i 


fclog I + 2 fclog/c + 2 ^ log k 
e-''v{l - 2 p) log - ^ 2 ) 


(l + o(l)). (227) 


This is maximized for £ = 1, thus yielding the second term 
in ( | 1 10 [ l upon writing k log k = {k log |)(l + o(l)) and 
k\ogp= j 4 e(fclog 1){1 + 0 ( 1 )) (since k = 0 (p^)). 

Finally, we consider ( |J7| l with £ > Lki|i;J- this case, 
the numerator is dominated by the first term, and for the 
case that ^ a G (0, 1], we obtain the condition 


n > 


ak log f 


e-(i-“)'^(i72(e-“'^ *p) - iT2(p))(l - ^ 2 ) 


(1 


■ 0 ( 1 )), 

(228) 


where we have used ( |224| l. For the case that 0 with 
£ > LisffcJ, we obtain a condition of the form ( |227| l where 
only the first term of the numerator is kept. Such a condition 
is clearly dominated by ( 227| l. 

Using the result in |[TT Thm. 3a] in the limiting case that 
the number of defective items grows large, we have for the 
worst-case choice of a S [ 0 , 1 ] and an optimized choice 
of > 0 that the minimax threshold resulting from ( |228| l 
is obtained with a = \ and v = log 2. Substituting these 
values yields the second term in (|109]l. 


C. An Auxiliary Result for Comparing the Terms 

The following result allows us to compare the terms 
appearing in the achievability part of Corollary 

Proposition 14. For all p G (0, 0.5), we have 

(l-2p)log^ >4(log2-iT2(p)). (229) 

Proof: By some simple manipulations, the left-hand 
side can be written as log p(i^p) ~ 2772 (p), and we may 


thus equivalently prove that log + 2772(p) > 4 log 2. 

This, in turn, can be verified by showing that the minimum 
of the function log pp^_p) + 2772 (p) occurs at p = 0.5, i.e., 
the point about which it is symmetric. ■ 
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